Extract, Transform, and Load (ETL) is the process that has been used to share data between applications, transactional systems, and data warehouses for decades. It essentially works like this; you define an integration, pull the data out of the source system, use some mapping and aggregation rules to transform the data into the format needed by the target system and then you load (save) the data in the target system’s database.
While this process seems simple and intuitive, it has a few problems that are leading many companies to question the sustainability of the practice. For solution and data architects, ETL can quickly become integration hell.
- The need to pre-define what data needs to move between the systems and what transformations need to take place
- Moving more data than you need
- The complexity of tracking data through multiple systems
- The effort/cost of keeping ETL processes up to date as source and target systems change
- The security vulnerabilities exposed during the ETL process itself
ETL works great in situations where you are defining a system or a set of integrations that will be stable for a long time -that isn’t the reality for most modern business-IT ecosystems. The push for business agility has caused applications and business processes to change rapidly, thereby increasing the cost of integration between applications. This application data integration churn is difficult for ETL solutions to support.
Significantly reduce your ETL burden
The good news for the IT industry is that there are now ways to reduce your use of ETL and help get your staff out of ETL hell. You can do this by relying on three key principles:
- If you can use the data directly from the source system, don’t copy it at all. Much of the system integrations and ETL setups that have been built over the past few decades were developed as a work-around to compute capacity and performance in individual applications. Transactional data was moved out of source systems and into data warehouses for reporting in order to avoid analytics processes slowing down transactional workflows. With compute now being both fast and cheap, often your transactional systems can handle processing analytics and new transactions at the same time without a measurable performance impact.
- Only move the data you need when you need to use it. Transition from pushing data downstream to pulling data at the time of consumption. This not only lowers the amount of data that gets copied amongst systems, but it ensures that the data that your users and business processes consume is as current as possible. When you push data through a system, you develop the challenge of keeping the target data up to date with changes in the source system. By pulling data when you need it, any changes have already been applied.
- Plan for change. Where ETL was designed for stability, modern IT environments are designed for agility. That means you need to move from fixed, pre-defined integrations and ETL definitions towards a solution that centralizes your connection management and makes data available across the enterprise. This may be an operational data warehouse, or it may be simply an enterprise data bus. What you are looking for is flexibility and the ability to reconfigure your flow of data whenever business needs or systems change.
Moving out of ETL hell and finding a solution that feels more like data heaven starts with developing a more agile mindset about how data flows across your organization. Don’t assume you’ll know in advance what your business will need or assume that the systems you have today will be the systems you have in your IT environment tomorrow. Look for modern data management platforms like Actian that will enable you to manage your connections in a consistent way, aggregate your data for use across the enterprise, and provide the analytics tools to develop the insights you need today and a new set of insights tomorrow.