Maintaining Standard Orbit Around Hadoop. Scanning for Life Signs.


I was a fly on the wall for an interesting conversation on Twitter between two UK centered BI / Analytics and Data Warehouse consultants, FlyingBinary’s Jacqui Taylor (An Actian Partner), and Joe Harris. This interview with Actian Corp’s CEO, Steve Shine, on Computer Business Review started their conversation: Q & A: Why Ingres Rebranded as Actian. Their conversation got me thinking about the big data gravitational giant that Hadoop has become, and where we fit in its orbit.

With the kind permission of Joe and Jacqui, here’s the gist of the Twitter conversation I came across after it was over, with Twitterisms removed:

Joe Harris: Why Actian rebranded from Ingres >> Ok, now I think they’re toast again. They don’t understand Hadoop’s gravity effect.


Jacqui Taylor: Different market. The big data space are early adopters still.

Joe Harris: For how long? 2 years? 5? As data gets bigger it gets exponentially harder to move. Hadoop will be at the centre, in my opinion.

Jacqui Taylor: ActianCorp will be at the forefront with not just Hadoop. Also, DSE & equivalent platforms.

Joe Harris: I don’t see that, I’m afraid. For SMEs and startups with many terabytes of active data, Hadoop is more or less the only viable choice.

Jacqui Taylor: True, but SMEs and startups become enterprises. Scaling requires different technology and challenges, not just Hadoop.

After seeing the conversation in my stream, since I follow both of these folks, I decided to poke my nose in with a comment:

Paige Roberts: Thanks. We are aware of the Hadoop gravity well. In standard orbit, but we know it’s not the only planet in the system. ;-)

Joe Harris: Ha, very good! Well it’s looking likely to be the sun around which all else orbits. Or gets consumed. :-)

Jacqui got in the last word, when I asked her for permission to quote her in this post, she remarked that “Hadoop was not the only elephant in the room.”

The essential underlying truth behind this conversation is that big data is not just a large enterprise issue. All the other technology on the market, aside from Hadoop (and Actian in my opinion), seems to assume that it is, and therefore, it carries a large enterprise price tag, and very little flexibility of choice. If I’m the CTO of a five person bootstrapped internet startup, I might have to process multiple terabytes of data to be successful, but I’m not likely to invest in a high end monolithic data appliance from one of the industry giants.

On the other hand, if my little hypothetical startup takes off, I’m going to need the capacity to scale up fast, or get left behind. Hadoop makes all kinds of sense there. As demands increase, I’m going to need more analytic flexibility, though, more than just Hadoop. And Hadoop can be a beast to get up and running, requiring some serious skill sets. Wouldn’t it be great if some vendor were smart enough to offer analytic capabilities directly on Hadoop that were easy to use? Ahem.

Since my job title got shifted recently in the merger to “Hadoop Analytics Evangelist,” I don’t think it’s going to surprise anyone that I’m a big fan of the Hadoop technology stack and its incredible potential. Hadoop is brilliant. It scales without limits. It uses standard hardware anyone can afford. It is developing greater and greater capabilities and adoption levels by the minute. Companies are getting tremendous value from using it.


But it’s not a panacea. It’s easy to see the whole world as a nail when the only tool in your belt is a hammer. And it’s not an easy hammer to use, either. As I said before, you need some pretty impressive, and expensive, skills to navigate the intricacies of MapReduce. Hadoop has a powerful pull that can suck you in, a black hole of long expensive projects with little to show for them, if you’re not careful. And as you realize the limits of this brilliant technology, you have to look at how to fill in those gaps, and still make use of that power.

When a company needs the kind of capabilities that Hadoop isn’t the best at, wouldn’t it be nice if some vendor would offer other forms of analytics from SMP in-chip analytics on a single machine, up to MPP high speed ad hoc queries, maybe with a data warehouse, or other transactional system as well, and with high speed Hadoop ETL to link them all together? Hmm. Wonder who does that.

The most exciting development in Hadoop right now, is YARN, in my opinion. Hadoop is progressing from a one trick pony to a true platform for building flexible large scale business analytics. Allowing more than one application to negotiate resource allocation on a Hadoop cluster, opens up the power of the Hadoop Distributed File System to a wide variety of analytics application approaches. With YARN, Hadoop is becoming a genuine star in the data analytics solar system.

Life signs abound. Landing party to the transporter room. Ready to beam down.



About Paige Roberts

As Actian’s Hadoop Analytics Evangelist, Paige identifies innovative big data and analytics trends and explores technology alternatives to help organizations drive actionable business value from their data. A seasoned software industry veteran with more than 15 years’ experience, Paige has worn a variety of hats including engineering, consulting, marketing and training. Follow her at @RobertsPaige.

View all posts by Paige Roberts →

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>