Managing and analyzing heterogeneous data is a challenge for most companies, which the oncoming wave of Edge Computing-generated datasets has only exacerbated. This challenge stems from a rather large “data-type mismatch” as well as how and where data has been incorporated into applications and business process. How did we arrive here?
At one time, data was largely transactional and Online Transactional Processing (OLTP) and Enterprise resource planning (ERP) systems handled it inline, and it was heavily structured. Primarily, Relational DataBase Management Systems (RDBMS) managed the needs of these systems and eventually evolved into data warehouses, storing and administering Online Analytical Processing (OLAP) for historical data analysis from various companies, such as Teradata, IBM, SAP, and Oracle.
In parallel, and more so during the last few years with the Internet of Things (IoT) revolution, the third wave of digitization of data is upon us, operating at the edge in sensors, video, and other IoT devices. They are generating the entire range of structured and unstructured data, but with two-thirds of it in a time-series format. Neither of these later datasets lends itself to RDBMS systems that underpin data warehouses due to how the data is processed and analyzed, the data formats used and the mushrooming dataset sizes.
Consequently, separate Document Store Databases, such as MongoDB and Couchbase, as well as several time-series databases, including InfluxDB and a multitude of bespoke Historians, emerged to handle these very distinct datasets. Each has a separate Application Programming Interface (API), lumped together as NoSQL – as in everything that’s not Structured Query Language (SQL).
The aftermath of these three waves of data types and database structures is data architects must now implement separate databases for each type of data and use case or try to merge and aggregate all of the different data types into a single database. Until recently, the only significant or enterprise-wide aggregation point for multiple databases and data types was the traditional data warehouse. The legacy data warehouse, however, is lagging as an aggregation point for two reasons.
First, many of them are based on inflexible architectures in terms of their capability to manage JSON and time-series data and the cost to expand them to administer larger datasets or complexity of modern analytics, such as Artificial Intelligence (AI) and Machine Learning (ML). Second, sending all the data to them in a single, centralized location on-premise can be costly and hinders decision-making at the point of action at the edge of the network.
During the era of edge computing and a wholesale flip of the majority of data being created and emanating from the edge instead of from the data center or a virtualized image in the cloud, specialized applications and platforms have an essential purpose in business process enablement. Just as each business process is unique, the data requirements for that technology to support those processes are also unique. While it may seem best-of-breed database technology for document store versus time-series versus traditional, fully structured transactional data may remove constraints on the use of technology within a business, you should be very careful before you go that route.
In general, the more APIs, underlying database architectures, resulting differences in supporting file formats, management, and monitoring systems and changes in which ones you use based on use case simply increase the complexity of your enterprise data architectures. This is particularly the case if you offer or implement multiple products, technologies and integration methodologies with this medley of databases. This complexity tends to have a domino effect into your support lifecycle for any software leveraging these databases – even the procurement of the databases.
Provided you can find a single database with similar performance and addresses all the data types and SQL as well as direct manipulation of the data through a NoSQL API, it makes far more sense to merge and aggregate heterogeneous data into a common database structure, particularly in Edge Computing use cases. For example, if you are looking at video surveillance data, sensor networks, and logs for security, then combinations of these and other disparate data sets must be aggregated for cross-functional analytics.
If you need to analyze, create reports and dashboards based on data of different types and in different source systems, then you will need some sort of capability for normalizing the data, so it can be queried either onsite or remotely from a single data set.
The requirements have changed during the last 30 years and Actian has built a new modular database that is purpose-built for edge-computing technologies and use cases and is capable of handling all datasets through a single NoSQL API, yet provides full SQL compliance. In both SQL and NoSQL functions, our 3rd party benchmark results show far better performance than any of the major Document Store, Time-Series or traditional SQL databases capable of handling Mobile and IoT.