“Google I/O: Hello, Dataflow, Goodbye, MapReduce” – Who Said That?


Google, welcome to the DataFlow bandwagon!  Having led a team that has been developing, supporting and aggressively enhancing a highly parallel dataflow implementation for some 6-7 years, I was intrigued to see that Google has discovered the power of using dataflow for its own offering announced this week.  It’s great to welcome an industry giant’s validation of a strategic technology I’m hugely excited about.

Robin Bloor wrote a nice summary of the beauty and structure of a dataflow architecture as it relates to event streaming a year ago.  Like Google, Actian applies dataflow to analytics on very high-volume streams of data, to do massive ingest for data integration and preparation, and for implementing multistep pipelines to (in Google software engineer Frances Perry’s words) “extract deep insight from datasets of any size.”

Anyone who has spent any time in Hadoop recognizes the limitations of the extremely primitive MapReduce environment. With MapReduce, the design-time performance is terrible and the run-time performance is terrible (that’s a bad combination J).  The introduction of YARN frees developers from the shackles of MapReduce, which is why we were first out of the gate last fall with our YARN-certified DataFlow implementation running 100% natively in Hadoop, including a joint reference architecture with Hortonworks.

In many years of fast-paced DataFlow development, we’ve produced an incredibly rich offering that serves as the backbone of the Actian Analytics Platform with:

  • A comprehensive set of almost 100 highly parallel data preparation and advanced analytics dataflow operators
  • A drag-and-drop dataflow visual interface for an amazingly fast and simple design-time experience (via our multiyear collaboration with KNIME)
  • The full spectrum of horizontal, vertical, pipeline and broadcast parallelism under the covers – exploiting fine-grained thread-level parallelism in the nodes and Hadoop scale across the nodes, so your applications gain all of the power of scaling up and out on multicore and multinode without you having to understand any of the complexities of parallel programing (queueing, threading, memory management) – we take care of it all for you
  • The ability to run natively at every hardware scale – from desktop (easy to test!) to server to cluster – and automagically scaling at runtime to fully consume all cores and nodes without changing a line of code
  • Access to this full set of dataflow capabilities AND the SQL you know and love  with the recent launch of the world’s highest-performing industrial-grade SQL-in-Hadoop platform

So welcome, Google, to embracing the goodness of DataFlow – the optimal platform for building a whole new generation of data and computationally intensive analytic applications.

And for those who want to leapfrog straight to a mature, rich and robust implementation of DataFlow – welcome to the wonderful post-MapReduce world of Actian DataFlow.  Best of all, we are compatible with every major Hadoop distribution so you can get started now on your Hadoop implementation of choice.

P.S.  Still wondering “who said that”?  Check out Charles Babcock’s InformationWeek article.


About Mike Hoskins

Actian CTO Michael Hoskins directs Actian’s technology innovation strategies and evangelizes game-changing trends in big data, analytics, Hadoop and cloud to give insight into Accelerating Big Data 2.0™. Mike, a Distinguished and Centennial Alumnus of Ohio’s Bowling Green State University, is a respected technology thought leader who has been featured in TechCrunch, Forbes.com, Datanami, The Register and Scobleizer. Mike has been a featured speaker at events worldwide, including Strata NY + Hadoop World, keynoting at DeployCon, the “Open Standards and Cloud Computing” panel at the Annual Conference on Knowledge Discovery and Data Mining, the “Scaling the Database in the Cloud” panel at Structure, and the “Many Faces of Map Reduce - Hadoop and Beyond” panel at Structure Big Data. Mike received the AITP Austin chapter's Information Technologist of the Year Award for his leadership in developing Actian DataFlow, a highly parallelized framework to leverage multicore. Follow Mike on Twitter: @MikeHSays.

View all posts by Mike Hoskins →

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>