Résumé

  • Data lineage is the documented path data follows from its source through every transformation, storage layer, and final use.
  • It helps organizations understand where data came from, what happened to it, where it is used, and who depends on it.
  • In the example, an e-commerce revenue metric flows from the transaction database to ETL extraction, staging tables, analytics models, daily aggregates, metric definitions, and finally an executive dashboard.
  • Lineage matters because it improves trust, speeds up troubleshooting, supports compliance, and makes change impact easier to manage.
  • The example also shows the importance of both technical lineage, which tracks systems and transformations, and business lineage, which explains how metrics like revenue are defined and used.

Data moves constantly throughout an organization’s information ecosystem. It is collected from customers, generated by systems, transformed by analytics tools, and ultimately used to make decisions that affect revenue, compliance, and strategy. Yet many teams struggle to answer a deceptively simple question: Where did this data come from, and how did it get here? This is where data lineage becomes essential.

Data lineage provides visibility into the life cycle of data from its origin through every transformation and handoff to its final destination. It turns the data pipeline into a transparent, traceable asset. To understand why this matters and how it works in practice, it helps to walk through a concrete, end-to-end example.

Qu'est-ce que la lignée de données ?

Data lineage is the documented path that data follows as it flows through systems, processes, and transformations. It describes:

  • Source systems where data originates.
  • Transformations applied to the data (such as cleansing, aggregation, or enrichment).
  • Intermediate storage or processing layers.
  • Final destinations like reports, dashboards, or machine learning models.

Lineage can be represented visually (as flow diagrams), technically (as metadata captured by tools), or narratively (as documentation). At its core, data lineage answers four fundamental questions:

  1. Where did the data come from?
  2. What happened to it along the way?
  3. Where is it used now?
  4. Who or what depends on it?

L'importance de l'historique des données

Before diving into the example, it is worth grounding the discussion in why data lineage is essential.

  • Trust and accuracy: Analysts and executives are more likely to trust reports when they can trace numbers back to authoritative sources.
  • Debugging and impact analysis: When a metric looks wrong, lineage helps teams quickly identify where errors may have been introduced.
  • Compliance and auditing: Regulations often require organizations to prove how sensitive or regulated data is handled.
  • Change management: When a source system or transformation changes, lineage reveals which downstream assets will be affected.

With these benefits in mind, let us look at a practical example.

Lineage Example: An E-Commerce Company

Imagine an e-commerce company that sells products online. The leadership team reviews a daily dashboard showing Total Daily Revenue, Number of Orders, and Average Order Value. These metrics influence marketing spend, inventory planning, and executive reporting.

Behind this dashboard lies a complex data pipeline. We will trace the lineage of one metric, Total Daily Revenue, from its raw source to the executive dashboard.

Step 1: Data Origin at the Source System

The lineage begins at the point where data is first created.

Source System: Online Transaction Database

When a customer places an order on the website, the transaction is recorded in a relational database that supports the e-commerce application. This database includes a table called orders with fields such as:

  • order_id
  • customer_id
  • order_timestamp
  • order_total
  • currency
  • order_status

At this stage, the data is raw and operational. It reflects real-time business activity and is optimized for transaction processing, not analytics.

Lineage Note

The authoritative source for revenue data is the orders.order_total field in the transaction database.

Step 2: Data Extraction via ETL or ELT

Operational systems are rarely queried directly for analytics. Instead, data is extracted and moved into an analytics environment.

Extraction Process

Every hour, an automated ETL (Extract, Transform, Load) job pulls new and updated records from the orders table. The extraction logic includes:

  • Selecting orders created or updated since the last run.
  • Filtering out test transactions.
  • Capturing metadata such as extraction time.

The extracted data is written to a staging area in a cloud data warehouse.

Lineage Note

This step introduces the first transformation: filtering out test orders. Lineage records should capture what logic was applied and when.

Step 3: Staging Layer in the Data Warehouse

In the data warehouse, the extracted data lands in a staging table, often named something like stg_orders.

Purpose of the Staging Layer

The staging layer serves as a near-raw copy of source data, with minimal transformation. It provides:

  • A stable snapshot of source data.
  • A buffer between operational systems and analytics logic.
  • A place to standardize basic formats.

In this example, the staging process may:

  • Convert timestamps to UTC.
  • Normalize currency codes to uppercase.
  • Ensure numeric fields use consistent data types.

Lineage Note

Lineage at this stage connects orders in the source system to stg_orders in the warehouse, including details about standardization steps.

Step 4: Transformation into a Clean Analytics Model

Next, the data is transformed into a model designed for analytics and reporting.

Business Logic Applied

A transformation job creates a table called fact_orders. This process applies more substantial business rules, such as:

  • Including only orders with a status of COMPLETED or SHIPPED.
  • Converting all order totals into a single reporting currency (for example, USD) using exchange rates.
  • Rounding monetary values to two decimal places.

At this stage, order_total becomes order_total_usd.

Lineage Note

This is a critical point in lineage. The revenue number now depends not only on the original order total, but also on exchange rate tables and status filters. Lineage should document these dependencies clearly.

Step 5: Aggregation for Daily Metrics

Executives do not look at individual orders; they look at summaries.

Aggregation Logic

Another transformation aggregates the fact_orders table into a daily summary table called daily_revenue.

This process:

  • Groups orders by date (derived from order_timestamp)
  • Sums order_total_usd for each day
  • Counts distinct order_id values

The resulting table includes fields like:

  • order_date
  • total_daily_revenue
  • total_orders

Lineage Note

Lineage now shows that total_daily_revenue is derived from a sum of order_total_usd, which itself came from the original order_total field with multiple transformations applied.

Step 6: Semantic Layer and Business Definitions

To make data accessible to non-technical users, many organizations introduce a semantic or metrics layer.

Metric Definition

In the business intelligence tool, Total Daily Revenue is defined as:

The sum of total_daily_revenue from the daily_revenue table, filtered by the selected date range.

This definition is stored alongside metadata such as:

  • Metric owner
  • Description
  • Refresh frequency

Lineage Note

Lineage extends beyond tables and columns into metrics and definitions. This ensures everyone understands how a KPI is calculated.

Step 7: Visualization in an Executive Dashboard

Finally, the metric appears in an executive dashboard.

Dashboard Usage

The dashboard displays:

  • A time series chart of Total Daily Revenue.
  • Comparisons to previous periods.
  • Alerts when revenue drops below a threshold.

Executives may make decisions based on these numbers within minutes of viewing the dashboard.

Lineage Note

The final step in lineage connects the dashboard visualization back to the metric definition, the aggregated table, and ultimately to the original transaction records.

Visualizing the Complete Data Lineage

If we summarize the example visually, the lineage might look like this:

  1. orders table (transaction database)
  2. ETL extraction job
  3. stg_orders (staging layer)
  4. fact_orders (cleaned and enriched analytics model)
  5. daily_revenue (aggregated table)
  6. “Total Daily Revenue” metric definition
  7. Executive dashboard visualization

Each arrow between these steps represents a transformation or dependency that should be documented and, ideally, automatically captured.

How Lineage Helps When Something Goes Wrong

Consider a scenario where executives notice that Total Daily Revenue suddenly drops by 20% overnight.

Without lineage, teams may waste hours guessing. With lineage:

  • Analysts trace the metric back to the daily_revenue table and see fewer records for the affected date.
  • They follow the lineage upstream and discover that the ETL job filtering order statuses were updated.
  • The update accidentally excluded a valid status value introduced by the e-commerce application.

Because lineage clearly shows dependencies, the root cause is identified quickly and fixed with confidence.

Technical vs. Business Lineage

This example highlights two complementary types of lineage:

  • Technical lineage: Table-to-table, column-to-column mappings and transformations.
  • Business lineage: How business concepts like “revenue” or “orders” are defined and used in reports.

Both are necessary. Technical lineage helps engineers maintain pipelines, while business lineage helps stakeholders trust and interpret the data.

Tools and Automation

In practice, lineage can be captured:

  • Manually, through documentation (error-prone and hard to maintain).
  • Semi-automatically, using metadata from ETL and BI tools.
  • Fully automatically, using modern data lineage and data catalog platforms.

The more automated the lineage, the more likely it is to stay accurate as systems evolve.

How the Actian Data Intelligence Platform Supports Data Lineage

This example of data lineage shows that even a seemingly simple metric like Total Daily Revenue depends on a long chain of systems, transformations, and definitions. Data lineage makes this chain visible. By tracing data from its origin in a transaction database to its final presentation in an executive dashboard, lineage builds trust, speeds up troubleshooting, supports compliance, and enables smarter decision-making.

To see how the Actian Data Intelligence Platform helps maintain data governance policies and supports data lineage, request a personalized demonstration today.