What is Streaming Ingestion?

Data Platform

Streaming ingestion involves real-time analytics that provide insights from data-in-motion needs to ingest data continuously from a subscribed message queue. Since high message volumes can overwhelm applications consuming data, micro-batches can collect messages and provide them to the consuming application in regular short intervals. If the data source is a traditional data file, traditional batch ingestion can be used.

Streaming Data Sources

Examples of streaming data include IoT sensor output, log files, clickstreams, message-based business transactions and interactions from gaming applications.

Why Use Real-Time Streaming Ingestion?

Traditional applications process data in batches, resulting in delayed presentation and analysis of events. This slight delay can result in missed opportunities to capitalize on highly time-critical events. Streaming applications can process events in real-time so a business can respond to events immediately.

Stream Processing Frameworks

The pioneers of message-based event systems are IBM with  MQSeries and TIBCO on Open Systems. Below are some open-source and commercial examples:

  • Apache Flink supports stateful computing over data streams for event streams and ETL.
  • Apache Ignite for high-performance computing with in-memory speed is used to add speed to existing applications.
  • Apache Samza for stateful applications that process data in real-time, running as a stand-alone library or under YARN.
  • Apache Spark natively supports scalable, fault-tolerant streaming applications.
  • Apache Storm for distributed real-time parallel task computations.
  • Amazon Kinesis Data Streams as data arrives for real-time managed data streaming applications.
  • Microsoft Azure Event Hubs provide a highly scalable streaming ingestion service that works with any real-time analytics provider.
  • Microsoft Azure IoT Hub is designed to provide bidirectional machine-to-cloud communication for IoT streams.
  • Apache Kafka on HDInsight is ideal for Hadoop-style Big Data applications.

Real-Time Streaming Application Examples

Streaming data ingestion from multiple sources must be processed before users can extract meaning or insights from the data. The examples below benefit from real-time data stream processing:

  • Fraud detection systems collect real-time streaming data to respond to suspicious activities.
  • Cyber threats need to be countered before they threaten the business. Security Information and Event Management (SIEM) systems analyze logs and monitor network activity to detect and shut down any potential threat.
  • Autopilot systems for controlling machines such as aircraft, drones or road vehicles collect data from multiple sensors such as GPS, Lidar, altimeters, sonar and cameras. This data must be processed using onboard processors to control the vehicle’s speed, altitude, and direction.
  • Stock trading systems must monitor changing stock prices in real-time to honor pre-set buy and sell orders. For example, if you have a pre-set order to sell a stock if the price falls below $20 and the stock fluctuates between $22 and $19 for a split-second, a brokerage needs to execute the trade within a sub-second time window to retain business from that trader.
  • Sentiment analysis of social media streams allows an organization to react to sudden changes in customer perceptions. Executives must be responsive to news that impacts their customers.
  • Retailers collect and process real-time feeds from in-store beacon systems that identify customers who have visited their website with interest in a certain product and who are in the vicinity of a physical store. In response to this data, an SMS or email offer can be sent in seconds to entice the prospect to become a customer.
  • Sales and marketing systems can use clickstream data to trigger an interaction with a chatbot or agent.
  • Gaming companies use in-game behavior analytics to suggest new games or offer the most relevant ads for in-game purchases.

How Actian Handles Data Streaming Ingestion

Thanks to its built-in data integration technology, the Actian Data Platform can provide real-time insights based on streaming data. The Actian Data Platform runs on-premises and on cloud platforms, including AWS, Google Cloud and Microsoft Azure. DataConnect supports file-based and stream-based ingestion from sources, including JMS, Kafka, MSMQ, RabbitMQ and WebSphere MQ.

You can trial the Actian Data Platform for 30 days at no cost. Visit our website and sign up for the free trial.