Three things you were guaranteed to find on me in high school: rolled-up jeans, air bangs and my TI (Texas Instruments)-85. I’m sure any cool quotient that the first two brought me by keeping up with the latest fashion trends was outdone by my obsession with that graphing calculator.
The TI-85 was a requirement for AP calculus (though it was banned from other classes due to its ability to store notes), and it changed the way we learned. Until that point, math classes had been very two dimensional: here’s a concept, read these pages, do these problems. But the TI-85 changed the way the teacher taught, and it accelerated the nightly homework ritual. It became an extension of the class itself, and I often wondered, “How did they teach this class before graphing calculators exist?”
Data scientists today must find themselves in a similar boat. The analytics they’re using and algorithms they’re running are much more complicated than the problems we solved in my high school AP calculus class, but the idea of empowering and freeing the data scientist has never been more important in this era of Big Data analytics.
I’ve been intrigued by the recent backlash against Big Data, which I suppose is only natural given how much hype it has generated the past several years. Many are saying that Big Data has not lived up to its promise and that organizations are still struggling to deliver transformational value from data. While I would agree that the struggle is real, I would also argue the potential is, too. Big Data holds tremendous potential, but only to the extent that data scientists can extract nuggets of insight from data using advanced analytics. Even with the proliferation of machine learning and automation, human analysis remains just as important, if not more so. That’s why data scientists are such a hot commodity today. In many ways, they hold the keys that unlock the promise of Big Data.
With a limited pool and shortage of skills, maximizing the efficiency of the data scientist is something every organization wants to do. Yet many data scientists find themselves spending the bulk of their time in the data preparation stage: collecting, massaging, blending and enriching, before they can get to the analytic computation. What happens when you can accelerate the entire analytic process, making the data preparation easier and also boosting the performance of analytic queries? That’s exactly what Actian has done by integrating R, the popular data scientist programming language, into the Actian Analytics Platform.
R has never been known for its simplicity or speed, but Actian improves both. To make R easier, Actian does two things: 1) put a wrapper around it that allows you to use a drag and drop interface when building data flows; 2) embeds it in its high performance analytics database and gives people access to it via SQL or R. Actian brings extreme performance to R analytics while also surrounding it with a full set of data blending and enrichment capabilities running natively on Hadoop. By integrating R with a full analytics platform, Actian also allows data scientists to bring in any data source, right at the point where they’re running their algorithms. The time saved on the data preparation side frees the data scientists to be more creative in their models while the access to any data source expands the breadth of possible queries. Data scientists can spend more time developing and testing models, and analyzing results, while knowing their results are more accurate due to the fact that they can access all data available to them and aren’t constrained by samples.
Boosting the analytics quotient of your organization doesn’t end with the data scientist. Much like a flight attendant reminds us to put our own masks on first before helping others, empowering the data scientists allows them to help their less technical colleagues and fosters broader analytics skills. Actian addresses this by integrating R in a way that makes R analytics reusable across 4 distinct scenarios:
- SQL users can call upon R in-database, without any knowledge of the complicated R language, and run analytics queries.
- Data flow workers can use a drag and drop interface to create visual workflows and execute them in parallel right on Hadoop or in the data flow itself
- Data scientists and business analysts can share workflows with other users who can then add new data and blending
- R users can use R packages to run R in high-performance database without any knowledge of SQL
As I reflect on my TI-85 from the early 90’s, I can only imagine what the world might look like 20 years from now. Are you a data scientist? What’s your TI-85 today, and how is it helping you deliver transformational value? Are you a business executive? How are you keeping your data scientists happy while making analytics accessible and consumable to all? I’d love to hear from you in the comments.
Visit www.actian.com/R-integration for more information on this new announcement.