When companies project data warehouse growth, they often look at a curve that mimic’s the growth of data in source systems. This approach works if all you are concerned with is storage growth. What is commonly overlooked is the compute growth curve for processing updates to your data. As IT systems become more interconnected, data updates in one place have a ripple effect as they are replicated elsewhere. Your data warehouse then receives updates from all impacted data sources, not just the place where the change was initiated. If the compute growth isn’t planned for and accommodated in your data warehouse architecture and infrastructure, performance is likely to suffer.
The Snowball Effect on Lag Times
Data warehouse performance is a critical thing to keep on top of. If you start getting backlogged in processing data updates, the problem keeps getting worse. Keep in mind; updates are streaming data. If your data warehouse is only able to process 9 out of 10 requested units every second, that means 1 unit goes into a queue. That may not seem like a big deal, but if the situation continues for 2 minutes, you then have 120 units in the queue and a 13 second lag time. If the situation persists for an hour, there will be 3600 units in the queue and a 6.6 minute lag time for processing. Play this out over a business day, and you can see that this becomes an overwhelming problem very quickly.
Why is this Important?
Data warehouse performance may not seem like a big issue in the context of scheduled reports and batch queries. Where it becomes problematic is in the context of modern “digitally transformed” business processes that rely on data warehouses as a point of aggregation for real-time operational metrics that span multiple source systems. Take, for example, a manufacturing facility with different production lines. This facility will have lots of sensors and smart machinery collecting data and streaming it to a data warehouse where it is combined with materials supply information, product quality data (from testing), and outbound logistics data. The data warehouse enables data from all of the smart systems to be aggregated together into a set of end-to-end datasets that can power the dashboards that facility operators use to keep things running smoothly. If performance issues in the data warehouse cause updates from source systems to be delayed, problems cannot be identified/remediated in real-time, and the business loses the agility that it requires to run optimally.
Actian Avalanche Addresses the Data Warehouse Performance Challenge
The Actian Avalanche hybrid cloud data warehouse addresses the performance challenge and minimizes the risk of update lag time in 3 key ways.
- Dynamic cloud-scale compute. Avalanche leverages the flexible nature of cloud infrastructure to adjust compute resources to match processing needs on-demand. If you see an increase in data updates due to data growth or a spike, Avalanche can adjust infrastructure resources to provide the needed capacity. You can’t do this with an on-premise infrastructure that is constrained by fixed hardware capacity.
- Vector processing. Vector processing enables large data sets to be processed more efficiently, lowering the overall compute load on the system.
- Maximizing hardware utilization. Actian Avalanche is designed to leverage CPU chip cache for execution processing, and every available CPU core to minimize infrastructure waste. Most data warehouse systems fail to use all available CPU capacity and leverage RAM memory for execution processing – both reduce the capacity for high-performance processing
At the end of the day, your data warehouse’s ability to maintain high performance when processing updates comes down to supply and demand for compute capacity. You can’t really control the demand for updates (unless you want to unplug some source systems or slow down your business). The piece of this you can control is your data warehouse’s capacity for processing these updates. Avalanche provides the highest performing solution on the market with a combination of cloud-scale resources, efficient use of hardware resources, and vectorized array-at-a-time processing. With Avalanche, you can spend more time focusing on growing your business and less time worrying about whether your data warehouse can keep up.
To learn more, visit www.actian.com/avalanche.