Qu’est-ce qu’un data contract ?
A data contract is a formal agreement between a data producer and a data consumer that defines the schema, quality standards, ownership, delivery SLAs, and terms of use for a specific dataset — so that downstream teams know exactly what they will receive and producers know exactly what they are accountable for delivering.
Think of it as an API contract, but for data. Where an API contract governs how software systems communicate, a data contract governs how data flows between teams, pipelines, and systems.
Data Contract Definition
A data contract is a machine-readable document — typically written in YAML using the Open Data Contract Standard (ODCS) — that specifies what a data producer commits to delivering and what a data consumer can depend on receiving. It makes expectations explicit, versioned, and enforceable before problems reach production.
When a data contract is embedded in a pipeline, schema changes trigger alerts rather than silent failures, quality violations are caught at the source rather than discovered in a quarterly report, and ownership is documented rather than held as institutional memory.
What a Data Contract Contains
| Composant | What it defines |
|---|---|
| Fundamentals | Contract ID, name, version number, and status |
| Schéma | Table structure, column names, data types, primary keys, and field-level business semantics |
| Règles de qualité des données | Validation checks that must pass before data is considered deliverable |
| Team | Who owns and maintains the contract and how consumers can reach them |
| SLAs | Delivery schedule, freshness guarantee, and availability commitments |
| Terms of use | What consumers can and cannot do with the data |
| Servers | Where the data lives and how to connect to it |
A minimal contract can start with fundamentals, schema, and quality rules. SLAs, terms of use, and server details are added as the program matures.
Why Data Contracts Matter
Without a data contract, a schema change in a source system silently breaks downstream reports. A field that analysts depend on disappears without warning. A machine learning pipeline receives zeros for a feature it uses as a key predictor and nobody notices until model performance degrades in production.
Data contracts prevent these failures by making the interface between producers and consumers explicit and enforceable. They shift data reliability from a reactive problem — discovered after something breaks — to a proactive commitment defined before data moves.
Organizations use data contracts to:
- Prevent silent schema failures. Contract validation catches breaking changes before they reach downstream consumers.
- Enforce data quality at the source. Quality rules defined in the contract run automatically at pipeline execution.
- Document ownership. Every contract has a named producer and steward accountable for its contents.
- Enable reliable self-service. Consumers can discover contracts in the data catalog and build on data without asking the producing team for assurances.
- Support governance and compliance. Contracts provide the documentation that regulatory audits require for data lineage, quality, and handling standards.
Data Contract vs. Related Concepts
Data contract vs. data quality rule: A data quality rule is a single validation check. A data contract is the broader agreement that contains quality rules alongside schema, ownership, SLAs, and terms of use. Quality rules tell you whether data is good. A data contract tells you everything about the data and what the producer commits to.
Data contract vs. SLA: An SLA covers delivery commitments: when data arrives, how fresh it is, and how available it is. A data contract includes the SLA plus schema, quality standards, ownership, and terms of use. An SLA is one component of a data contract.
Data contract vs. data catalog: A data catalog documents what data assets exist and their metadata. A data contract formalizes the commitments a producer makes about a specific asset. A catalog entry describes a dataset; a data contract governs its delivery. Data contracts are often published to the catalog so consumers can discover them.
Data contract vs. data governance policy: A data governance policy sets organization-wide standards for how data is managed. A data contract applies those standards to a specific dataset exchanged between a specific producer and consumer. Policies are organizational; contracts are dataset-specific.
Data Contracts in Practice
The problem they solve: A data engineering team renames a column in the orders table. Three downstream pipelines fail. Two dashboards show null values. A machine learning feature pipeline has been silently receiving zeros for 48 hours. The problem is discovered when a VP asks why revenue numbers dropped.
With a data contract in place: The same rename triggers a contract validation failure before the change ships. The contract specifies that the original column name is required. The system identifies every downstream consumer that depends on it. The team coordinates the change, updates the contract version, and deploys with a migration path. No pipelines break.
FAQ
A formal agreement between the team that produces a dataset and the teams that consume it, specifying what the data will contain, what quality it will meet, when it will arrive, and who is responsible for it.
Not in most cases. Data contracts are internal governance documents used to coordinate and enforce standards between teams. They are not commercial legal contracts. Their value is operational: making expectations explicit and enabling automated enforcement.
The data producer writes the initial draft in collaboration with data consumers. The producer defines what they can commit to delivering; the consumer defines what they need. A data steward or governance lead may review and approve the contract before it is published.
The Open Data Contract Standard (ODCS) is a Linux Foundation project that defines a machine-readable YAML format for data contracts. It covers fundamentals, schema, quality rules, team ownership, SLAs, terms of use, and server configuration. It is the emerging industry standard for data contracts across modern data stacks.
By embedding schema and quality validation into the pipeline that produces data. When a contract specifies a field must have a null rate below 2% and the pipeline produces data with 15% nulls, the validation fails and alerts the owner before bad data reaches downstream consumers. Schema changes that would break consumer pipelines are caught before deployment.
In a data mesh, domain teams own and publish data products. A data contract is the formal interface that makes a data product consumable: it defines what the product delivers, to what standard, and under what terms. Without data contracts, data mesh products are undocumented. With them, each product has a versioned, enforceable interface.