Data pipelines form a multistep process that moves and refines data from source systems, including a data lake, to a target data platform. The steps are in a series where each can be parallelized to accelerate data movement through the data pipeline. Data pipelines automate the movement, transformation and cleansing of data from a data source, on its journey to the destination data repository.
Why use data pipelines?
Data pipelines provide end-to-end visibility and control over the flow of data. This provides opportunities to create reusable automation elements and enhances data provenance. Using standardized processes and tools also aids data governance efforts.
Data pipeline steps
The data pipeline consists of multiple steps that commonly include the following functions:
Difference between ETL pipelines and data pipelines
A data pipeline with a broad scope can contain extract, transform, and load (ETL) steps. ETL processes invariably end in a database. It can end in an intermediate refinement stage, such as a data lake.
Benefits of using data pipelines
Below are some of the benefits of using data pipelines:
- Supports a systematic approach that can be automated.
- Components of the data flow can be reused to lower ongoing development costs.
- Data sources can be traced to support data provenance.
- End-to-end visibility of a data flow helps to catalog data sources and consumers.
- Automated process consistency. Manual and ad-hoc workflows are more error prone.
- Data pipelines can be nested for complex use cases.
- They improve data quality as processes mature.
- Decision confidence increases when using data sourced from robust pipelines.
Data pipelines in the Actian Data Platform
The Actian Data Platform can build and schedule data pipelines and has hundreds of prebuilt connectors to sources, including Marketo, Salesforce and ServiceNow. The Actian data platform uses a vectorized columnar database that outperforms alternatives by 7.9x. Data integration technology is built in to support data pipelines that include a graphical designer that enables you to lay out data pipelines to connect, profile, transform and load data. Pipeline steps can be scheduled and run in parallel.
Try the Actian Data Platform for 30-days using the free trial at: https://www.actian.com/avalanche-try-now-start-free/