While I was testing this, the code update that enables HDFS support was released (so we had to try that right). The update process on an Isilon is a nice experience that pretty much just works. It updates a node at a time and ensures data remains online and available while it happens. Isilon's approach to Hadoop seems very... non-hadoopish at first glance. They offer the HDFS support but you have to supply the Hadoop compute nodes, so you separate compute and storage which is kind of a fundamental architecture decision of traditional Hadoop. In this case I think the trade off may be well worth it because of the benefits Isilon brings to the party.
One of the downsides to traditional Hadoop is that a lot of thought has to be put into how to place data for redundancy and the name node for HDFS is NOT redundant. The Isilon solves these problems with its architecture and also allows processing of data that was written to the Isilon over a different protocol without a second import process. In our case we had filesystems shared via NFS and wrote several Apache server logs to the Isilon via NFS. We then processed that data using HDFS on Hadoop nodes and the output of the Hadoop jobs could then be accessed via NFS again. So as an example, we processed the Apache web server logs to show the breakdown of iPhone/iPad devices hitting the website site based on the user agent string. This code took about 30 minutes to write and ran across 1.6 billion log entries in about 4 minutes with our setup. The report showed the breakdown by hour of the day and iPad versus iPhone and the top 10 most request URLs, etc. Really we could report on whatever, but this was an easy first test case. The Hadoop output is also accessible via NFS so it can be served up by a traditional web server back to the marketing department who requested the report. So a slight modification of our Apache log rotation to copy files to the NFS mount or you could write the Apache logs directly to the Isilon and you could have a report generated pretty much automatically. This is part of what makes Isilon different from other HDFS implementations. Being able to use one copy of the data and access via multiple protocols is a very powerful thing and it makes it very easy. The other advantage is you get the ability to use Isilon's snapshots (and even replication). For many people that may not matter for your Hadoop data, but this is really about adding Hadoop capability to existing data.
In order to create this setup, I used RPMs from the GreenplumHD distribution of Hadoop and the Java distribution came from java.com. I had 8 servers running RedHat 6.1 for this experiment. After installing the rpms, I created a Hadoop user and we were ready to start processing data. The entire install from updating the Isilon code to installing the RPMs, accounts, etc. took 48 minutes to have it all installed and configured. (This process is now even faster for me after building a yum repo for all the software).
As far as configuration, the only thing we had to edit for the Isilon integration was the core-site.xml file. The fs.default.name value is set to the smartconnect address of the Isilon and you are done.
We did also end up tuning the # of task tracker threads in the mapred-site.xml file and mapred.local.dir values for a local working directory for Hadoop. (I hope to have a more technical dive blog on this later as I'm still revising some of this process).
So the only warnings I would have would this approach is that I'm pretty sure this setup could also be used to stress test network switches in addition to processing data. We had 10Gbit networking in the lab and everything was on the same LAN segment. In one case we did push an aggregate of 10Gbit of data from the Isilon to the compute nodes in other workloads. So if you go down this route, keep your compute nodes on the same switch as the Isilon nodes and this should scale pretty far. Also, I don't know that I would buy an Isilon strictly for Hadoop. It is a relatively expensive platform compared to a traditional Hadoop installation, but if you already have an Isilon or you are looking at it for other use-cases, like cloud storage, then this is certainly something you might want to add to the toolkit. I can't overstate the simplicity of setup and ongoing operations of the Isilon and Hadoop. It really lets you spend time doing analysis of your data rather than managing the infrastructure.
Truely a very good article on how to handle the future technology. This content creates a new hope and inspiration within me. Thanks for sharing article like this. The way you have stated everything above is quite awesome. Keep blogging like this. Thanks :)
ReplyDeleteSoftware testing training in chennai | Testing training in chennai | Software testing course in chennai
Using big data analytics may give the companies many fruitful results, the findings can be implemented in their business decisions so as to minimize their risk and to cut the costs.
ReplyDeletehadoop training in chennai|big data training|big data training in chennai
Cloud computing is the next big thing, through cloud the users have the liberty to use a shared network. The companies can focus on core business parts rather than investing heavily on infrastucture.
ReplyDeletecloud computing training in chennai|cloud computing courses in chennai|cloud computing training
This comment has been removed by the author.
ReplyDeleteGood Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeleteSalesforce Training in Chennai
Salesforce Online Training in Chennai
Salesforce Training in Bangalore
Salesforce Training in Hyderabad
Salesforce training in ameerpet
Salesforce Training in Pune
Salesforce Online Training
Salesforce Training
Top Notch article, it is particularly valuable! I unobtrusively began in this, and I'm transforming into more familiar close by inside the focal point of it better! Delights, articulation war more and hauling continuing extraordinary! SD Card Recovery Software Full Version Free Download With Key
ReplyDeleteRecuva Crack Download Howdy, I see as cognizant part concentrating on this text a delight. it's miles really pleasurable to prepared and fascinating and profoundly tons looking take care of to perusing extra of your trademark..
ReplyDeleteappy Birthday Wishes For Long Distance Relationship ... I miss you so much. I hope you are having a great day. I want to tell you again how happy you make me and ...Birthday Wishes For Long Distance Relationship
ReplyDelete