An Example of Data Lineage
Résumé
- Data lineage is the documented path data follows from its source through every transformation, storage layer, and final use.
- It helps organizations understand where data came from, what happened to it, where it is used, and who depends on it.
- In the example, an e-commerce revenue metric flows from the transaction database to ETL extraction, staging tables, analytics models, daily aggregates, metric definitions, and finally an executive dashboard.
- Lineage matters because it improves trust, speeds up troubleshooting, supports compliance, and makes change impact easier to manage.
- The example also shows the importance of both technical lineage, which tracks systems and transformations, and business lineage, which explains how metrics like revenue are defined and used.
Data moves constantly throughout an organization’s information ecosystem. It is collected from customers, generated by systems, transformed by analytics tools, and ultimately used to make decisions that affect revenue, compliance, and strategy. Yet many teams struggle to answer a deceptively simple question: Where did this data come from, and how did it get here? This is where data lineage becomes essential.
Data lineage provides visibility into the life cycle of data from its origin through every transformation and handoff to its final destination. It turns the data pipeline into a transparent, traceable asset. To understand why this matters and how it works in practice, it helps to walk through a concrete, end-to-end example.
Qu'est-ce que la lignée de données ?
Data lineage is the documented path that data follows as it flows through systems, processes, and transformations. It describes:
- Source systems where data originates.
- Transformations applied to the data (such as cleansing, aggregation, or enrichment).
- Intermediate storage or processing layers.
- Final destinations like reports, dashboards, or machine learning models.
Lineage can be represented visually (as flow diagrams), technically (as metadata captured by tools), or narratively (as documentation). At its core, data lineage answers four fundamental questions:
- Where did the data come from?
- What happened to it along the way?
- Where is it used now?
- Who or what depends on it?
L'importance de l'historique des données
Before diving into the example, it is worth grounding the discussion in why data lineage is essential.
- Trust and accuracy: Analysts and executives are more likely to trust reports when they can trace numbers back to authoritative sources.
- Debugging and impact analysis: When a metric looks wrong, lineage helps teams quickly identify where errors may have been introduced.
- Compliance and auditing: Regulations often require organizations to prove how sensitive or regulated data is handled.
- Change management: When a source system or transformation changes, lineage reveals which downstream assets will be affected.
With these benefits in mind, let us look at a practical example.
Lineage Example: An E-Commerce Company
Imagine an e-commerce company that sells products online. The leadership team reviews a daily dashboard showing Total Daily Revenue, Number of Orders, and Average Order Value. These metrics influence marketing spend, inventory planning, and executive reporting.
Behind this dashboard lies a complex data pipeline. We will trace the lineage of one metric, Total Daily Revenue, from its raw source to the executive dashboard.
Step 1: Data Origin at the Source System
The lineage begins at the point where data is first created.
Source System: Online Transaction Database
When a customer places an order on the website, the transaction is recorded in a relational database that supports the e-commerce application. This database includes a table called orders with fields such as:
- order_id
- customer_id
- order_timestamp
- order_total
- currency
- order_status
At this stage, the data is raw and operational. It reflects real-time business activity and is optimized for transaction processing, not analytics.
Lineage Note
The authoritative source for revenue data is the orders.order_total field in the transaction database.
Step 2: Data Extraction via ETL or ELT
Operational systems are rarely queried directly for analytics. Instead, data is extracted and moved into an analytics environment.
Extraction Process
Every hour, an automated ETL (Extract, Transform, Load) job pulls new and updated records from the orders table. The extraction logic includes:
- Selecting orders created or updated since the last run.
- Filtering out test transactions.
- Capturing metadata such as extraction time.
The extracted data is written to a staging area in a cloud data warehouse.
Lineage Note
This step introduces the first transformation: filtering out test orders. Lineage records should capture what logic was applied and when.
Step 3: Staging Layer in the Data Warehouse
In the data warehouse, the extracted data lands in a staging table, often named something like stg_orders.
Purpose of the Staging Layer
The staging layer serves as a near-raw copy of source data, with minimal transformation. It provides:
- A stable snapshot of source data.
- A buffer between operational systems and analytics logic.
- A place to standardize basic formats.
In this example, the staging process may:
- Convert timestamps to UTC.
- Normalize currency codes to uppercase.
- Ensure numeric fields use consistent data types.
Lineage Note
Lineage at this stage connects orders in the source system to stg_orders in the warehouse, including details about standardization steps.
Step 4: Transformation into a Clean Analytics Model
Next, the data is transformed into a model designed for analytics and reporting.
Business Logic Applied
A transformation job creates a table called fact_orders. This process applies more substantial business rules, such as:
- Including only orders with a status of COMPLETED or SHIPPED.
- Converting all order totals into a single reporting currency (for example, USD) using exchange rates.
- Rounding monetary values to two decimal places.
At this stage, order_total becomes order_total_usd.
Lineage Note
This is a critical point in lineage. The revenue number now depends not only on the original order total, but also on exchange rate tables and status filters. Lineage should document these dependencies clearly.
Step 5: Aggregation for Daily Metrics
Executives do not look at individual orders; they look at summaries.
Aggregation Logic
Another transformation aggregates the fact_orders table into a daily summary table called daily_revenue.
This process:
- Groups orders by date (derived from order_timestamp)
- Sums order_total_usd for each day
- Counts distinct order_id values
The resulting table includes fields like:
- order_date
- total_daily_revenue
- total_orders
Lineage Note
Lineage now shows that total_daily_revenue is derived from a sum of order_total_usd, which itself came from the original order_total field with multiple transformations applied.
Step 6: Semantic Layer and Business Definitions
To make data accessible to non-technical users, many organizations introduce a semantic or metrics layer.
Metric Definition
In the business intelligence tool, Total Daily Revenue is defined as:
The sum of total_daily_revenue from the daily_revenue table, filtered by the selected date range.
This definition is stored alongside metadata such as:
- Metric owner
- Description
- Refresh frequency
Lineage Note
Lineage extends beyond tables and columns into metrics and definitions. This ensures everyone understands how a KPI is calculated.
Step 7: Visualization in an Executive Dashboard
Finally, the metric appears in an executive dashboard.
Dashboard Usage
The dashboard displays:
- A time series chart of Total Daily Revenue.
- Comparisons to previous periods.
- Alerts when revenue drops below a threshold.
Executives may make decisions based on these numbers within minutes of viewing the dashboard.
Lineage Note
The final step in lineage connects the dashboard visualization back to the metric definition, the aggregated table, and ultimately to the original transaction records.
Visualizing the Complete Data Lineage
If we summarize the example visually, the lineage might look like this:
- orders table (transaction database)
- ETL extraction job
- stg_orders (staging layer)
- fact_orders (cleaned and enriched analytics model)
- daily_revenue (aggregated table)
- “Total Daily Revenue” metric definition
- Executive dashboard visualization
Each arrow between these steps represents a transformation or dependency that should be documented and, ideally, automatically captured.
How Lineage Helps When Something Goes Wrong
Consider a scenario where executives notice that Total Daily Revenue suddenly drops by 20% overnight.
Without lineage, teams may waste hours guessing. With lineage:
- Analysts trace the metric back to the daily_revenue table and see fewer records for the affected date.
- They follow the lineage upstream and discover that the ETL job filtering order statuses were updated.
- The update accidentally excluded a valid status value introduced by the e-commerce application.
Because lineage clearly shows dependencies, the root cause is identified quickly and fixed with confidence.
Technical vs. Business Lineage
This example highlights two complementary types of lineage:
- Technical lineage: Table-to-table, column-to-column mappings and transformations.
- Business lineage: How business concepts like “revenue” or “orders” are defined and used in reports.
Both are necessary. Technical lineage helps engineers maintain pipelines, while business lineage helps stakeholders trust and interpret the data.
Tools and Automation
In practice, lineage can be captured:
- Manually, through documentation (error-prone and hard to maintain).
- Semi-automatically, using metadata from ETL and BI tools.
- Fully automatically, using modern data lineage and data catalog platforms.
The more automated the lineage, the more likely it is to stay accurate as systems evolve.
How the Actian Data Intelligence Platform Supports Data Lineage
This example of data lineage shows that even a seemingly simple metric like Total Daily Revenue depends on a long chain of systems, transformations, and definitions. Data lineage makes this chain visible. By tracing data from its origin in a transaction database to its final presentation in an executive dashboard, lineage builds trust, speeds up troubleshooting, supports compliance, and enables smarter decision-making.
To see how the Actian Data Intelligence Platform helps maintain data governance policies and supports data lineage, request a personalized demonstration today.