Data integration connects disparate data sources to support business decision making. Data integration solutions can include the following functions:
- Extract, Transform and Load (ETL) functions to connect, gather, clean and transfer data to a data mart or data warehouse for analysis.
- Extract, Load and Transform (ELT) technology to filter, transform and aggregate data sets inside a data warehouse.
- Replicating change from an operational system to a data warehouse.
- Data pipeline orchestration.
- Data transfer scheduling.
- Data deduplication and filling gaps in data using default values, extrapolation, and interpolation.
Data integration strategy
Using a common data integration strategy, an organization can reduce the costs of managing ad-hoc point-to-point data integrations. A unified approach has advantages:
Makes data connections faster to deploy
Provides more robust connections
Reduces maintenance costs
Departments operating data silos can lead to duplicate data and wasted effort. Taking a platform approach improves the visibility of data flows within a business. Having a single place to manage integrations allows an organization to untangle complex interconnections into data hubs or data busses and gain a single-pane view of data flows. As new data sources such as clickstreams and sensor feeds need to be adopted, an integration platform provides scalability without introducing crippling management costs.
Building-in data quality
Creating reliable data for analytics involves tracking data sources and using only the most authoritative data. Data validation rules fill gaps, check formats of individual data fields for consistency and enforce the referential integrity of relationships between data elements.
Data profiling utilities validate data quality, and data transformation functions make data more uniform before loading it into a target data platform. Parallelizing large data operations can accelerate transfer and transformation.
A robust data integration solution monitors transfers and flags any exceptions before the data is used for decision making.
Data integration in the cloud
Data integration tools have evolved to support cloud-based applications. Many solutions began life with a cloud-native or cloud-first focus, while others have adopted cloud as they have evolved from being initially on-premises. Many solutions support hybrid deployment so developers can easily use data residing on-premises and in public cloud platforms. Modern tools provide a graphical user interface to design data flows visually to save time.
Data Integration for data lakes
In the past, big data was often synonymous with Apache Hadoop and its clustered file system. Today Hadoop is losing its appeal because cloud providers provide scalable storage at a more abstract level using block storage without the need to manage a cluster of servers.
Streaming data systems such as Apache Kafka support data sources that need to share continuous streams. Change Data Capture (CDC) solutions such as high volume replication (HVR) support moving data – from data lakes and transactional databases to data warehouses/data platforms. CDC technology can be configured to allow for bi-directional data flows. Data clashes are detected and resolved with rules such as using the data value with the latest timestamp.
Extended data integration
Some premium data integration platforms include capabilities to support data governance, including the ability to trace data back to raw sources using data provenance and catalog functions that track how users and applications consume data. These extended functions allow a business to retire less used integrations and better consolidate or reuse existing integrations.
Many databases, such as Ingres, SQL Server and Oracle, provide their own integration services and work with specialist data integration tools.
Below are some use cases for data replication:
- Retailers use data replication to publish updated product pricing to stores and, conversely, receive store sales data for analysis in data warehouses/data platforms.
- Global financial reporting systems use CDC technology to extract data from country-level accounting systems for regional and regulatory filings.
- Mobile phone network operators use local call logs from cell towers to manage the quality of service (QoS) across their networks.
- Transportation companies fit their vehicles with GPS sensors to collect live locations for route optimization.
- Insurance companies use multi-step data integration to provide local reporting at branches using a uniform format. Consolidating this information at HQ provides sales teams with industry benchmarks that differentiate policy management services.
- Medical research uses data integration to collect clinical trial data that is aggregated and published centrally. This enables collaboration across the globe to fight disease.
Data integration using Actian solutions
The Actian Data Platform supports many of the above use cases. The Actian Data Platform has built-in connectors to hundreds of data sources, including cloud-based applications such as Salesforce and NetSuite. A universal adapter makes it easy to create custom interfaces for legacy applications making it easy to manage existing integration jobs along with new ones.
Try the Actian Data Platform with a free 30-day trial at https://www.actian.com/avalanche-try-now-start-free/