Part One: Mobile may be IoT—but, when it comes to data, IoT is not Mobile
Three weeks ago, we looked at the raw performance—or the lack thereof—of SQLite. After that, we looked at SQLite within the broader context of Modern Edge Data Management and discovered that its performance shortcomings were in fact compounded by the demands of the environment. As a serverless database, SQLite requires integration with a server-based database—which inevitably incurs a performance hit as the SQLite data is transformed through an ETL process for compatibility with the server-based database’s architecture.
SQLite partisans might then adapt a snarky tone and say: “Yeah? Well if SQLite is so slow and integration is so burdensome, can you remind me why it is the most ubiquitous database out there?”
Well, yeah, we can. And in the same breath we can provide even partisans with ample reason to doubt that the popularity of SQLite will continue going forward. Spoiler alert: What do the overall growth curves of the IoT look like outside the realm of mobile handsets and tablets?
How the Banana Slug Won the Race
In the first blog in this series we looked at why Embedded Developers adopted SQLite over both simple file management systems on the one end of the data management spectrum and large complex RDBMS systems on the other end. The key technical reasons, just to recap, include its small-footprint; its ability to be embedded in an application; its portability to almost any operating system and programming language with a simple architecture (Key-Value Store); and its ability to deliver standard data management functionality through an SQL API. The key non-technical reasons—okay, reason—is that, well, it’s free! in use cases dominated by personal applications that needed built-in data management (including developer tools), web applications that needed a data cache, and mobile applications that needed something with a very small footprint. If you combine free with these technical characteristics and consider where and how SQLite has been deployed, it’s no surprise that, in terms of raw numbers, SQLite found itself more widely deployed than any other database.
What all three of the aforementioned use cases have in common, though, is that they are single-user scenarios in which data associated with a user can be stored in a single file and data table (which, in SQLite are one and the same). Demand for data in these use cases generally involves serial reads and writes; there’s little likelihood of concurrent reads, let alone concurrent writes. In fact, it wasn’t until later iterations of SQLite that the product’s developers even felt the need to enable simultaneous reads with a single write.
But here’s the thing: Going forward, those three use cases are not going to be the ones driving the key architectural decisions. Ironically, the characteristics of SQLite that made it so popular among developers and in turn gave rise to a world in which billions of devices are acting, reacting, and interacting in real time—at the Edge, in the Cloud, and in the data center—and that’s a world for which the key characteristics of SQLite are singularly ill-suited.
SQLite has essentially worked itself out of a role in the realm of Modern Edge Data Management.
As we’ve mentioned earlier, SQLite is based on an elegant but simple architecture, Key-Value Store, that enables you to store any type of data. Implementation is done in C with a very small footprint, a few hundred KBs, making it portable to virtually any environment with minimal resourcing. And, while it’s not fully ANSI standard SQL, it’s close enough for horseshoes, hand grenades, and mobile applications.
SQLite was adopted in many early IoT applications as these early design-ins were almost mirror images of mobile applications (minus the need for much effort at the presentation layer), focused on local caching of data with the expectation that it would be moved to the cloud for data processing and analytics. Pilot projects on the cheap meant designers and developers kneejerk to what they know and what is free – ta-dah SQLite!
Independent of SQLite, the IoT market and its use cases have rapidly moved off this initial trajectory. Clear proof of this is readily apparent if you’ve had the opportunity to go to IoT trade shows over the last few years. Three to five years ago, recall how many of the sessions described proof of concepts (PoCs) and small pilots where all data was sent up into the cloud. When we spoke to engineers and developers on the trade show floor, they were skeptical about the need for anything more than SQLite or if you needed a database at all – let alone client-server versions. However, in the last three years, more of the sessions have centered on scaling up pilots to full production and infusion of ML routines into local devices and gateways. Many more of the conversations involved considerations to use more robust local data management, including client-server options.
Intelligent IoT is Redefining Edge Data Management
For all its strengths in the single-user application space, SQLite and its serverless architecture are unequal to the demands of autonomous vehicles, smart agriculture, medical instrumentation, and other industrial IoT spaces. The same is true with regard to the horizontal spaces occupied by key industrial IoT components, such as IoT gateways, 5G networking gear, and so forth. Unlike single-user applications designed to support human-to-machine requirements, innumerable IoT applications are being built for machine-to-machine relationships occurring in highly automated environments. Modern machine-to-machine scenarios involve far fewer one-to-one relationships and a far greater number of peer-to-peer and hierarchical relationships (including one-to-many and many-to-one subscription and publication scenarios), all of which have far more complex data management requirements than those for which SQLite was built. Moreover, as CPU power has migrated out of the Data Center into the Cloud and now out to the Edge, a far wider array of systems are performing complex software-defined operations, data processing, and analytics than ever before. Processing demands are becoming both far more sophisticated and far more local.
Consider: Tomorrow’s IoT sensor grids will run the gamut from low-speed, low-resolution structured data feeds (capturing tens of thousands of pressure, volume, and temperature readings, for example) to high-speed, high-resolution video feeds from hundreds of streaming UHD cameras. In a chemical processing plant, both sensor grids could be flowing into one or more IoT gateways that, in turn, could flow into a network of Edge systems (each with the power one would only have found in a data center a few years ago) for local processing and analysis, after which some or all of the data and analytical information would be passed on a network of servers in the Cloud.
Dive deeper: The raw data streams flowing in from these grids would need to be read and processed in parallel. These activities could involve immediately discarding spurious data points, running signal-to-noise filters, normalizing data, or fusing data from multiple sensors, to name just a few of the obvious data processing functions. Some of the data would be stored as it arrived—either temporarily or permanently, as the use case demanded—while other data might be discarded.
A World of Increasing Complexity
Throughout these scenarios we see far more complex operations taking place at every level, including ML inference routines being run locally on devices, at the gateway level, or both. There may be additional operations running in parallel on these same datasets—including downstream device monitoring and management operations, which effectively create new data streams moving in the opposite direction (e.g., reads from the IoT gateway and writes down the hierarchical ladder). Or data could be extracted simultaneously for reporting and analysis by business analysts and data scientists in the Cloud or Data Center. In an environment such as the chemical plant we have envisioned, there may also be more advanced analytics and visualization activities performed at, say, a local operations center.
These scenarios are both increasingly commonplace and wholly unlike the scenarios that propelled SQLite to prominence. They are combinatorial and additive; they present a world of processing and data management demands that is as far from that of the single-user, single-application world—the sweet-spot for SQLite—as one can possibly get:
- Concurrent writes are a requirement, and not just to a single file or data table—with response times between write requests of as little as a few milliseconds.
- Multiple applications will be reading and writing data to the same data tables (or joining them) in IoT gateways and other Edge devices, requiring the same kind of sophisticated orchestration that would be required with multiple concurrent users.
- On-premise Edge systems may have local human oversight of operations, and their activities will add further complexity to the orchestration of multiple activities reading and writing to the databases and data tables.
If all of this sounds like an environment for which SQLite is inadequately prepared, you’re right. In parts two and three of this blog we’ll delve into these issues further.