The Data Challenges of Telemetry By David Linthicum September 7, 2022 Telemetry is the automated communications process by which measurements are made and data collected at remote points. The data is then transmitted to receiving equipment for monitoring. The word ‘telemetry’ is derived from Greek roots: tele = remote, and metron = measure. Telemetry is not a new concept, that’s for sure. We’ve been watching telemetry at work for decades. For example, we’ve strapped transmitters onto migrating animals, weather buoys, seismic monitoring, etc.. However, the use of telemetry continues to accelerate, and this technology will bring up huge challenges to those of us responsible for data collection, data integration, and data analysis. The most recent rise of telemetry is around the use of new and inexpensive devices that we now employ to gather all kinds of data. These can range from Fit Bits that seem to be attached to everyone these days to count the steps we take, to smart thermostats that monitor temperature and humidity, to information kicked off by our automobiles as to the health of engine. The rise of the “Internet of Things” is part of this as well. This is a buzzword invention of an industry looking to put a name to the rapid appearance of many devices that can produce data, as well as the ability of these devices to self-analyze and thus self-correct. MRI machines in hospitals, robots on factory floors, as well as motion sensors that record employee activity are just a few of the things that are now spinning off megabytes of data each day. Typically, this type of information flows out of devices as streams of unstructured data. In some cases, the data is persisted at the device, and some cases not. In any event, the information needs to be collected, put into an appropriate structure for storage, perhaps combined with other data, and stored in a transactional database. From there, the data can be further transferred to an analytics-oriented database, or analyzed in place. Problems arise when it comes time to deal with that information. Obviously, data integration is critical to most telemetry operations. The information must be managed from point-to-point, and then persisted within transitional or analytics databases. While this is certainly something we’ve done for some time, the volume of information that these remote devices spin off is new, and thus we have a rising need to effectively manage a rising volume of data. Take the case of the new health telemetry devices that are coming onto the market. They can monitor most of our vitals, including blood pressure, respiration, oxygen saturation, and heart rate, at sub-second intervals. These sensors typically transmit the data to a smart phone, where the information is formatted for transfer to a remote database, typically in the cloud. The value of this data is very high. By gathering this data over time, and running analytics against known data patterns, we can determine the true path of our health. Perhaps we will be able to spot a heart attack or other major health issues before they actually happen. Or, this information could lead to better treatment and outcome data, considering that the symptoms, treatment, and outcomes will now be closely monitored over a span of years. While the amount of data was relatively reasonable in the past, the number of data points and the frequency of collection are exploding. It’s imperative that we figure out the best path to data integration for the expanding use of telemetry. A few needs are certain: The need to gather information for hundreds, perhaps thousands of data points/devices at the same time. Thus, we have to identify the source of the data, as well as how the data should be managed in-flight, and when stored at a target. The need to deal with megabytes, perhaps gigabytes of data per hour coming off a single device, where once it was only a few kilobytes. Given the expanding number of devices (our previous point), the math is easy. The amount of data that needs to be transmitted and processed is exploding. The massive amounts of data will drive some data governance and data quality issues that must be addressed at the data integration layer. Data is typically not validated when it’s generated by a device, but it must be checked at some point. Moreover, the complexity of these systems means that the use of data governance approaches and technology is an imperative. This is exciting stuff, if you ask me. We’re learning to gather the right data, at greater volumes, and leverage that data for more valuable outcomes. This data state has been the objective for years, but it was never really obtainable. Today’s telemetry advances mean we have a great opportunity in front of us. About David Linthicum Dave Linthicum is the CTO of Cloud Technology Partners, and an internationally known cloud computing and SOA expert. He is a sought-after consultant, speaker, and blogger. In his career, Dave has formed or enhanced many of the ideas behind modern distributed computing including EAI, B2B Application Integration, and SOA, approaches and technologies in wide use today. For the last 10 years, he has focused on the technology and strategies around cloud computing, including working with several cloud computing startups. His industry experience includes tenure as CTO and CEO of several successful software and cloud computing companies, and upper-level management positions in Fortune 500 companies. In addition, he was an associate professor of computer science for eight years, and continues to lecture at major technical colleges and universities, including University of Virginia and Arizona State University. He keynotes at many leading technology conferences, and has several well-read columns and blogs. Linthicum has authored 10 books, including the ground-breaking "Enterprise Application Integration" and "B2B Application Integration."