Quantity as well as quality are so important to your data, but there is a third dimension to your data puzzle that is even more important – content.
As your company evaluates new data sources and improvements to your existing data, ask yourself, “Are we collecting the right data?” Here are three tips to determine if you are acquiring the data you actually need, or just creating data clutter.
Point-of-view vs. perspective
Each data set you acquire provides you with a unique point of view of your business operations and external environment. While insightful on their own, a single data set rarely (if ever) provides the complete picture. There are gaps, blind spots, bias and many other issues you will be forced to address. Similar data sets or those from similar sources are likely to have similar defects.
You can avoid this with data diversity. By aggregating data from different sources, you can assemble multiple points-of view of your operations, which lead to a more holistic perspective.
The best methods to identify what new data sources you need are to look for gaps in your current data as well as areas where data sources always agree with each other. Some level of data conflict is good, because it indicates you are gathering different points of view that describe unique facets or dimensions to your company.
Duplication and redundancy
While those words may seem to mean the same, in the context of selecting new data sources, there is an important distinction to understand. Duplicate data (that is the same) can usually be traced back to the same source system, even if it is acquired through different channels. A good example may be a list of products obtained from the marketing system vs. the manufacturing system.
If the lists are the same, then either one of the two systems is the system of record and the data has been copied into the other, or the data is sourced from somewhere else entirely. This is important, because adding duplicate data doesn’t create additional value for your company – you already have that data set.
Redundant data (data sets that are different, but overlap) is highly valuable, because it reflects different perspectives. In the marketing and manufacturing example, the list of products from manufacturing may contain those products your company builds or is in the process of building.
The marketing product list may contain products that you resell from 3rd parties (but don’t build yourself), but may not include new products that R&D is still developing. Some of the data between these data sets is the same, but the pieces that are different are very insightful.
Data that is up-to-date
Every piece of data you collect has a time stamp of when it was created or observed. Data starts aging from the time it is created, not when it is collected and added to a data warehouse. It is important to understand when your data was collected and how current the data is you ingest from different data sources. Digital business processes require real-time data to be effective.
To ensure you are collecting the most current data, trace where your data originated. You ideally want to collect data directly from the source system where it is first created and not some downstream system that only refreshes data periodically.
Data time stamps are particularly important in situations where you must perform time-series analysis to identify operational trends and quality issues or forecast future events. The sooner you can acquire the data, the sooner you will be able to analyze it and update your operational reports and forecasts – leading to more agile business operations.
Your company is continuously evolving, both your operations and your environment. Continual refinement of your data sources to ensure you are obtaining a holistic perspective that generates actionable insights and provides real-time visibility is essential if you want to succeed in a highly competitive business environment. It isn’t just about collecting more data, or better-quality data – you must collect the right data.
You can learn more about Actian data management products here.