Ready for Primetime: Industrializing Hadoop

I’m fresh off the road from fascinating customer/media/analyst Big Data roundtables in San Francisco, NY and London, and two themes resounded:

  1. 2014 is the year that the flood of data from the Internet of Things decisively enters mainstream businesses, and
  2. While Hadoop is our best hope for deriving business value from that data at scale, some big gaps remain

Hadoop’s boundless scale-out capabilities are cool for brute-force parallelism if you can afford legions of MapReduce programmers and data centers crammed with ever-growing server clusters, and not many companies fit that model.  Some disruptive innovations need to occur for Hadoop to make the crossover to mainstream adoption, or what I call the industrialization of Hadoop.

Don’t get me wrong, I’m not bashing Hadoop. It’s a foregone conclusion that legacy stacks simply cannot scale to accommodate IOT data volumes, and even if they could scale architecturally, no one could afford the 7- or 8-figure checks you’d have to write to get there. Meanwhile the Hadoop ecosystem gets richer every day. In his write-up of the London roundtable, V3’s Dan Robinson quoted me saying, “MapReduce is not just dragging us back to how things in IT were in the 1980s, it’s dragging us back to like it was in the 1950s.”  Every LOB or analytics or query request is choke-pointed via a tiny stratum of highly skilled specialists, and every query, business or analytics request must route through them, just as in the bad old mainframe days.

Time for Hadoop analytics to grow up.  We’ve invested tens of millions of dollars and hundreds of man-years to industrialize Hadoop, and we do it through the Actian Analytics Platform™. We’re delivering breakthrough design-time productivity with visual tooling to design end-to-end big data analytics that execute natively on Hadoop – no need to write a single line of MapReduce.

YARN is arguably the most exciting Hadoop breakthrough of our time.  Just as Google leaped forward with next-gen computational frameworks like Pregel, Dremel and Spanner, it’s time for the rest of us to leapfrog MapReduce with YARN-certified computational frameworks that take their rightful place as first-class citizens in Hadoop acceleration.  Actian’s contribution, DataFlow, delivers huge performance boosts while delivering rich big data functionality. And our joint reference architecture with Hortonworks makes it easy to get started.

The result: astounding business transformation based on data-driven insights.

At our SF roundtable, Evernote CTO Dave Engberg described how his team re-architected from the ground up using the Actian Analytics Platform on Hadoop to deploy a completely modern (and affordable) stack that executes lightning-fast queries on datasets that accumulate 200 million events per day.  As Dave says, “we’re able to answer many types of questions much faster than we could with a brute-force crawl in Hive.  We get a great performance boost with the Actian analytics database.” Among other results, Evernote rapidly assesses and adjusts marketing campaigns and user experience for their 80 million users. There’s measurable bottom-line impact as Evernote optimizes its ability to predict when users will convert from a free to a paid subscription – if you’re going to Strata, drop by and hear Evernote’s Damon Cool describe the technology and results in detail.

In my next blog I’ll talk about other ways the Actian Analytics Platform acts as a Hadoop Exoskeleton to industrialize Hadoop, delivering Big Data for the Rest of Us™.  Meanwhile I invite you to visit me and my Actian colleagues in Booth 725 at Strata to talk about Hadoop, YARN or whatever else is on your mind.

About Mike Hoskins

Actian CTO Michael Hoskins directs Actian’s technology innovation strategies and evangelizes game-changing trends in big data, analytics, Hadoop and cloud to give insight into Accelerating Big Data 2.0™. Mike, a Distinguished and Centennial Alumnus of Ohio’s Bowling Green State University, is a respected technology thought leader who has been featured in TechCrunch, Forbes.com, Datanami, The Register and Scobleizer. Mike has been a featured speaker at events worldwide, including Strata NY + Hadoop World, keynoting at DeployCon, the “Open Standards and Cloud Computing” panel at the Annual Conference on Knowledge Discovery and Data Mining, the “Scaling the Database in the Cloud” panel at Structure, and the “Many Faces of Map Reduce - Hadoop and Beyond” panel at Structure Big Data. Mike received the AITP Austin chapter's Information Technologist of the Year Award for his leadership in developing Actian DataFlow, a highly parallelized framework to leverage multicore. Follow Mike on Twitter: @MikeHSays.

View all posts by Mike Hoskins →

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>