Mainstreaming and making good decisions about Hadoop
I attended the 2012 Hadoop Summit in San Jose, Calif., in June. The summit was specifically for Hadoop advocates and developers to collaborate, define where Hadoop goes next and determine what needs fixing. That means a lot of fun for a geek like myself.
Making Hadoop mainstream
It was interesting to see HortonWorks and Cloudera leading the charge from a purely open-source perspective, while MapR — a more proprietary solution — seemed to be in the background. Granted, HortonWorks was one of the sponsors, but I got the feeling that a majority of the advocates and developers weren’t fans of a non-open-source solution like MapR. I do understand why, though.
Everyone talked about how to make Hadoop more enterprise-ready and relative to mainstream IT. One keynote speaker said Hadoop can go mainstream with three “ability” words: Reliability, deployability and manageability.
TripAdvisor.com’s presentation of its customer backup design for its shared-nothing commodity Hadoop cluster really stood out. The presenter noted that, a few months before the Hadoop Summit, the site had a major power outage that took down its entire data center. When TripAdvisor’s IT team brought up its Hadoop cluster, they lost 3 percent of their commodity drives and would have lost data without the backup copies.
Solving Hadoop’s problems
There was definitely a lot of information to digest at the summit. But what I really saw was only two main methods to move past Hadoop’s problems: Accept enterprise products into the current Hadoop commodity-only design, or use Hadoop only as an add-on to standard database and data store designs.
The first method seems to offend hardcore open-source Hadoop advocates. The commodity-products-only stance removes enterprise products from designs and makes it very hard to move Hadoop forward. I come from an enterprise data center background, so it’s hard for me to comprehend the angry responses to enterprise architectures for Hadoop. It is real, though.
The second method reduces the value proposition of Hadoop. If you have to store the data you are processing in Hadoop also in Oracle, SQL or in a separate file system — Hadoop won’t reduce costs, it will add costs. It can still addvalue, but the value pitch of Hadoop is that you can reduce your infrastructure costs because of the design of Hadoop. This option relegates Hadoop to an add-on tool in the data center instead of a standalone solution. There are benefits to relegating Hadoop to just an add-on tool, such as a quicker integration into the environment. The cost of the integration requires either a custom integration or an expensive ETL-type tool, though.
At SwishData, we’ve put together a Hadoop cluster that offers enterprise-class reliability and performance. We can be your go-to source to figure out which option for deploying Hadoop is right for your agency. Contact us today for more information.
Photo courtesy of flickr user erikeldridge