Big Data is about analytics if it is about anything at all. There are one or two Big Data applications that are not. For example, there’s 3D animation rendering. It involves high volumes of data and software developers working in that area even make use of Hadoop. But in the main Big Data is about analytics.
The historical record suggests that the first “big data” application ever was the taking of a census. In fact the word “census” comes from the Latin verb “censere” which means “to estimate.” And the Romans weren’t the first to take a census. As far as we know, it was the Babylonians. They are known to have taken a census some time around 3800 BC. In fact, the records suggest that they took a census every 6 or 7 years, and they didn’t just count the number of people, but also livestock and inventories of honey, wool vegetables and other edibles.
If the only technology you have is clay tablets, then taking a census is a Big Data project. In fact, the computer industry itself has its origins in census-taking, with Herman Hollerith, the American statistician, inventing the mechanical tabulator based on the use of punched cards for the sake of a census.
When you come to think of it, government invented Big Data. But government clearly ought to be involved in Big Data because in pretty much every country in the world teh government is the biggest organization there is and it usually needs to gather more data than any other organization.
The Analytics-Oriented Organizations
There are other organizations; banks, insurance companies, pharmaceutical companies telcos and big retailers who have traditionally used analytics. We can think of them as the Big Data early adopters. Their businesses are data heavy and they employ statisticians. Recent trends in Big Data has changed these organizations. Their analytics activities followed some variation of the traditional data warehouse operation - such companies were instrumental in defining what the data warehouse was and how it could be used.
The data environments of these organizations have changed in ways that have begun to make the old data warehouse arrangement redundant. This has been caused by the availability of new data sources. There are four “new” data sources.
- The social media data (Twitter, Facebook, RSS feeds, blogs, etc.), which has mushroomed and, for some organizations at least, contains valuable data.
- Existing log files within the organization. This data was always available, but wasn’t exploited much until Splunk provided purpose-built software to get at it. Once recognized as a data source and analyzed, its importance naturally increased.
- What we can think of as ‘Internet of Things” data. The early data sources for this are mobile devices and RFID tags, but they will no doubt mushroom as sensors and embedded processors proliferate.
- External data – some of which is available for free (there’s a fair amount of census data you can get hold of, for example) and some of which you have to buy or rent. A market for such data is developing. It’s in its early stages and will doubtless become more sophisticated with time.
The situation then is that analytics oriented organizations are being disrupted by the availability and proliferation of new data sources that happen to be valuable in one way or another. They are challenged in respect of their software architecture because a deluge of new data products (the Hadoop ecosystems, databases, streaming products, data flow and analytics products) are emerging. However they are not challenged in respect of understanding the value of analytics. They know its value very well.
The Non-Analytics-Oriented Organizations
On the other side of this coin is what I believe will eventually become the main market for analytics capability. Right now these organizations know very little about the power of analytics. It includes some areas of government and even includes small to medium size companies in some of the sectors already mentioned. But in the main it comprises businesses in other sectors which don’t yet understand the possibilities of the technology.
Such organizations are no strangers to Business intelligence. They have their reports and dashboards and OLAP drill-down applications. Some may even have deployed the kind of sophisticated BI capability offered by the likes of Tableau and Qliktech. What they tend not to have done is traditional statistical analysis, and as a consequence they tend to have little idea of the potential of predictive analytics. I suspect that the opportunity for these companies may be very large as long as they adopt analytics at the right time and in the right way.
The normal route to developing an analytics capability is closed to them. There are simply too few “data scientists” available for hire, and to be honest they cannot afford to hire them direct. But that’s OK, because many of those data scientists have joined software consultancies who want to sell their skills and capabilities to these analytically inexperienced businesses.
The point is that very few businesses, aside from those traditionally experienced in analytics have gone beyond traditional BI, and thus even fairly simple predictive analytics can make a significant difference to such a company’s performance. We are not talking here about predictive analytics based on real-time streaming data that is actioned automatically. We are simply talking about the analysis of statistical trends in buying patterns or supply chain disruption that can generate more profit or remove costs from the business.
In this area, we expect business-specific vertical markets to develop in analytics, in sub-sectors of health care, transport, regional banking, specialist retail and so on. This will gradually become the analytics software package market. Companies served by this market will not have data analysts, but they will have a software supplier who does and who knows their industry well enough to provide the kind of capability they need.