The title “Chief Data Officer” will mean different things to different people, so this is not a simple Yes/No question.
Some CEOs might ask “Why should we make such an appointment when we already have a CIO?” – and if they have, depending on what the CIO does, they probably shouldn’t. And if they already have a CTO rather than a CIO, they probably shouldn’t either.
After a recent presentation I did on the topic of Advanced Analytics someone asked me: “Should companies appoint a Chief Analytics Officer?” Again, why would a company do that if it already had a CIO or even CTO. But, in truth, this is not about the title, it is about a particular job and what it should involve…
Reading The Tea Leaves
There is an important cluster of trends that lie behind these questions. We are moving from a world of static data to a world of data in motion – a point I have made in previous blogs. A useful way to think about what this means is to consider the way we build systems. We are accustomed to building systems roughly according to the following logic:
- We want a system to automate certain tasks for us
- We must therefore capture the data we need in a database
- We must then build the application which sits over that database so that it can capture the data and makes changes to it when needed.
- We may also need to analyze the data we collect – perhaps passing it to a data warehouse for BI applications
This thinking spawned the old data warehouse architecture, which served us well for more than a decade, and which suggests a simple division between OLTP data and BI data, with data flowing from one to the other. The assumption in this is that we process transactions and then later we query that data to gather insights. This architecture is now outmoded and will surely be superseded.
The Event Data World
We don’t just process transactions any more, we also process events. This first became visible with the advent of the Web and the need to process click-streams. A “click” is an event not a transaction. The later emergence of CEP software (initially in financial markets) was also based on events. In both kinds of application, the events need to be analyzed as a time series of data. And it is entirely possible that some of the events are actually transactions, for example, a purchase on a web site, but most events are not.
The importance of events became even more obvious when Splunk shot to prominence. Splunk’s software enabled the easy use of log file data from the very many log files that are generated throughout a network. A good deal of that log data, which was rarely used for anything else, turned out to be really useful in new contexts, especially when joined with data from other sources.
As soon as we think of data as being a continual flow of atomic events coming into the corporate environment, or in some cases being generated by that environment, we are obliged to think of data flows long before we think of data storage. Logically then we should probably design corporate systems by starting with the flow of data and adding the databases (the places where data is stored because it is at rest) later.
If you are beginning to wonder what this has got to do with the idea of a Chief Data Officer, please bear with me.
None of this event data is of any use to anyone unless it is analyzed and value is extracted from it. There are, of course, many different kinds of analytics that can be applied, from data mining through to immediately actionable predictive analytics. The essential thing to realize here is that the systems that run the company (OLTP systems and Office Systems) and the Business Intelligence systems are starting to merge. It is data flow that makes this possible. The old Data Warehouse world expressed the reality (at that time) that the BI systems were always backward looking – but now some of those BI applications (predictive analytics in particular) are definitely forward looking. This is expressed in the idea of a data flow architecture.
The idea of dataflow is perhaps a little more revolutionary than the word on its own might suggest. When data flows it is really difficult for anyone whether senior executive or middle manager to claim “ownership” of data. You can own the land on both sides of a river and you might thus claim that you own a certain part of the river, but you do not own the water that flows through.
Let us now consider the job of the CFO. Money flows into the company. Some of that money must flow to every part of the company to pay the salaries of staff and the various consumables and the rent of the offices and equipment in use whatever it might be. There is a long established and well thought out system to enable that flow of money. It’s true that various executives and middle managers have rights to some of that money – given to them by the process of budgeting – but they do not own the bank account it actually sits in. The accountancy profession is skilled in managing this money flow and when the CFO and all the staff below him or her do their job well, the company functions far more effectively.
I’m beginning to believe that the same will soon apply to data. There needs to be a senior executive equivalent in every way to the CFO who is responsible for the flow of data and who is able to manage it through a system that is in every way equivalent to an accounting system, except that it flows data through the organization rather than money. Whether you call such an individual a CIO, CDO, CTO, CAO or anything else, hardly matters.
The point I’m making is that there is a need for such an individual and a need for a profession which, like the accountancy profession, sits over a system that ensures the accuracy and good use of data flows.