I was a fly on the wall for an interesting conversation on Twitter between two UK centered BI / Analytics and Data Warehouse consultants, FlyingBinary’s Jacqui Taylor (An Actian Partner), and Joe Harris. This interview with Actian Corp’s CEO, Steve Shine, on Computer Business Review started their conversation: Q & A: Why Ingres Rebranded as Actian. Their conversation got me thinking about the big data gravitational giant that Hadoop has become, and where we fit in its orbit.
With the kind permission of Joe and Jacqui, here’s the gist of the Twitter conversation I came across after it was over, with Twitterisms removed:
After seeing the conversation in my stream, since I follow both of these folks, I decided to poke my nose in with a comment:
Jacqui got in the last word, when I asked her for permission to quote her in this post, she remarked that “Hadoop was not the only elephant in the room.”
The essential underlying truth behind this conversation is that big data is not just a large enterprise issue. All the other technology on the market, aside from Hadoop (and Actian in my opinion), seems to assume that it is, and therefore, it carries a large enterprise price tag, and very little flexibility of choice. If I’m the CTO of a five person bootstrapped internet startup, I might have to process multiple terabytes of data to be successful, but I’m not likely to invest in a high end monolithic data appliance from one of the industry giants.
On the other hand, if my little hypothetical startup takes off, I’m going to need the capacity to scale up fast, or get left behind. Hadoop makes all kinds of sense there. As demands increase, I’m going to need more analytic flexibility, though, more than just Hadoop. And Hadoop can be a beast to get up and running, requiring some serious skill sets. Wouldn’t it be great if some vendor were smart enough to offer analytic capabilities directly on Hadoop that were easy to use? Ahem.
Since my job title got shifted recently in the merger to “Hadoop Analytics Evangelist,” I don’t think it’s going to surprise anyone that I’m a big fan of the Hadoop technology stack and its incredible potential. Hadoop is brilliant. It scales without limits. It uses standard hardware anyone can afford. It is developing greater and greater capabilities and adoption levels by the minute. Companies are getting tremendous value from using it.
But it’s not a panacea. It’s easy to see the whole world as a nail when the only tool in your belt is a hammer. And it’s not an easy hammer to use, either. As I said before, you need some pretty impressive, and expensive, skills to navigate the intricacies of MapReduce. Hadoop has a powerful pull that can suck you in, a black hole of long expensive projects with little to show for them, if you’re not careful. And as you realize the limits of this brilliant technology, you have to look at how to fill in those gaps, and still make use of that power.
When a company needs the kind of capabilities that Hadoop isn’t the best at, wouldn’t it be nice if some vendor would offer other forms of analytics from SMP in-chip analytics on a single machine, up to MPP high speed ad hoc queries, maybe with a data warehouse, or other transactional system as well, and with high speed Hadoop ETL to link them all together? Hmm. Wonder who does that.
The most exciting development in Hadoop right now, is YARN, in my opinion. Hadoop is progressing from a one trick pony to a true platform for building flexible large scale business analytics. Allowing more than one application to negotiate resource allocation on a Hadoop cluster, opens up the power of the Hadoop Distributed File System to a wide variety of analytics application approaches. With YARN, Hadoop is becoming a genuine star in the data analytics solar system.
Life signs abound. Landing party to the transporter room. Ready to beam down.