Qu'est-ce que la qualité des données ?

what is data quality

Data quality is the measure of how well data meets the standards required for its intended use. Data that is accurate enough for one purpose may be insufficient for another. Customer records precise enough for billing may not meet the completeness requirements for a predictive churn model. Financial data sufficient for internal reporting may not meet the lineage and accuracy requirements for a regulatory submission.

Quality is not a single attribute. It is a composite of six measurable dimensions, each of which matters differently depending on the use case and the domain.


Data Quality Definition

Data quality is the degree to which data is fit for its intended use across six dimensions: accuracy, completeness, consistency, timeliness, validity, and uniqueness.

High-quality data supports reliable analytics, trustworthy AI, and defensible regulatory reporting. Poor-quality data produces wrong decisions, broken pipelines, compliance penalties, and AI models that learn incorrect patterns from flawed training data.


The Six Dimensions of Data Quality

Dimension Definition Example of a failure
Précision Data correctly represents the real-world value it describes A customer’s address reflects a previous address after they moved two years ago
Complétude All required fields are populated with meaningful, non-null values 18% of customer records are missing an email address
cohérence The same value is represented identically across all systems where it appears “Active customer” means 90 days in the CRM and 180 days in finance — cross-system reports disagree
Rapidité d'exécution Data is current and available when it is needed for its intended use Yesterday’s inventory data is used to fulfil today’s orders, causing overselling
Validité Data conforms to defined formats, ranges, and business rules A date field contains “13/2026” — not a valid date in any standard format
Unicité Each real-world entity is represented exactly once with no duplicate records The same customer appears 23 times in the CRM with slightly different name spellings

Every data quality program should define acceptable thresholds for each dimension by domain. A financial transactions dataset might require 99.9% accuracy and 100% completeness on required fields. A marketing contact list might tolerate higher null rates on optional fields. Thresholds defined by domain make quality measurable and certifiable.


Pourquoi la qualité des données est-elle importante ?

Decisions: Every business decision is as reliable as the data behind it. A sales forecast built on inaccurate pipeline data produces a wrong projection. A supply chain model built on stale inventory data causes fulfillment failures. Data quality is a business performance issue, not a technical one.

Compliance: GDPR, HIPAA, SOX, and BCBS 239 each carry accuracy and completeness requirements. BCBS 239 requires banks to demonstrate that risk data is accurate and traceable. HIPAA requires patient records to be accurate and current. SOX requires that financial reporting data is reliable. Compliance failures caused by poor data quality carry financial penalties and regulatory exposure.

AI and machine learning: AI models learn from training data. A fraud detection model trained on transaction records with 15% inaccurate merchant category codes learns wrong patterns. A recommendation model trained on stale product data surfaces out-of-stock items. Data quality is the foundation that determines whether AI outputs are trustworthy or unreliable.

Operational efficiency: Poor data quality generates rework: engineers fixing pipelines that fail because of unexpected nulls, analysts rerunning reports after discovering source errors, customer service teams correcting billing mistakes. Gartner estimates that poor data quality costs organizations an average of $15 million per year — most of it invisible, embedded in time spent working around data nobody trusts.


Data Quality vs. Related Concepts

Data quality vs. data quality management: Data quality is a state: how well data meets defined standards at a point in time. Data quality management (DQM) is the program: the ongoing processes, roles, tools, and governance structures that measure, monitor, and improve that state over time. Measuring quality once is not managing it.

Data quality vs. data governance: Data governance defines the quality standards — the thresholds, dimensions, and certification criteria that apply to each domain — and assigns the accountability for enforcing them. Data quality management executes those standards through profiling, monitoring, and remediation. Governance without quality management produces standards that are never measured. Quality management without governance produces metrics that nobody acts on.

Data quality vs. data observability: Data observability monitors the health of data pipelines in real time: detecting anomalies in row counts, schema changes, and distribution shifts. Data quality measures the accuracy, completeness, consistency, timeliness, validity, and uniqueness of the data itself. Observability detects pipeline health problems. Quality measurement assesses the fitness of the data the pipeline produces. The two are complementary: observability catches the signal, quality measurement explains what it means.

Data quality vs. data cleansing: Data cleansing is a reactive process: finding and fixing quality problems in existing data. Data quality management is a proactive program: preventing problems through validation rules, monitoring quality continuously, and addressing root causes at the source. Cleansing treats symptoms. Quality management treats causes.


What Good Data Quality Looks Like in Practice

A data analyst searching for a revenue dataset: Finds three candidate assets in the data catalog. Each displays a quality score, certification status, last-updated timestamp, and a validation history. Two are certified above the defined threshold. The analyst selects one and uses it without escalating to the data team. The quality evidence removes the need for independent validation.

A data engineer preparing a schema change: Checks the data catalog for the quality score and lineage of the table being modified. Sees that three downstream reports and one ML feature pipeline depend on the column being changed. Coordinating fixes before shipping prevents production failures.

A compliance officer answering a GDPR right-to-erasure request: Pulls the lineage record for the subject’s customer record from the catalog. Lineage identifies every downstream system holding data derived from that record. Quality records confirm that each system’s copy of the data was accurate and current at the time of last use. Deletion is confirmed across all six systems in under an hour.


How Data Quality is Measured

Data profiling: Automated scanning of data assets to assess current quality characteristics: null rates, value distributions, format patterns, duplicate rates, and referential integrity. Profiling establishes the baseline and identifies where gaps exist.

Validation rules: Business rules that define what valid data looks like for each field: acceptable value ranges, required formats, referential constraints, and cross-field dependencies. Validation rules run continuously at ingestion and on stored data.

Quality scoring: Aggregation of profiling and validation results into a single score per asset. Scores make quality comparable across assets and over time, and they provide the basis for certification decisions.

Continuous monitoring: Automated monitoring of quality scores with alerts when scores fall below defined thresholds or anomalies appear. Monitoring catches quality issues before they reach production reports or AI pipelines.

FAQ

Data quality is how well data does its job. If you ask data a question and the answer is wrong because the data is inaccurate, incomplete, outdated, or inconsistent, you have a data quality problem. High-quality data gives you correct answers reliably.

Accuracy, completeness, consistency, timeliness, validity, and uniqueness. Most data quality frameworks use these six dimensions. Some add integrity (referential consistency between related datasets) and conformity (adherence to a defined standard or schema) as additional dimensions depending on the domain.

The most common causes are: manual data entry without validation; system integrations that load records without deduplication or format normalization, schema changes that break existing validation rules; batch pipelines with latency that exceeds the freshness requirements of downstream use cases; and the absence of a governed business glossary that causes the same concept to be defined differently across systems.

A numerical representation of how well a data asset meets its defined quality standards, typically expressed as a percentage. A score of 96% means the asset passes 96% of its defined quality checks. Scores make quality comparable across assets and over time and provide the basis for certification decisions.

The formal process of marking a data asset as approved for use after it has met defined quality thresholds and been reviewed by a data steward. Certified assets appear with a verified badge in the data catalog. Users trust certified assets without independent validation — which is what makes data quality programs operationally valuable.

AI models inherit every quality problem in their training data. Inaccurate data teaches models wrong patterns. Incomplete data causes models to miss predictive signals. Inconsistent data causes models to learn contradictory patterns from different source definitions. Data quality certification for AI training datasets is the infrastructure requirement that makes AI outputs trustworthy and AI governance programs defensible.

Accuracy is one of the six dimensions of data quality. It measures whether data correctly represents the real-world value it describes. Data quality is the broader measure that includes accuracy plus completeness, consistency, timeliness, validity, and uniqueness. A dataset can be accurate but incomplete, or accurate and complete but stale — data quality assesses all six dimensions together.

Responsibility is shared. Data owners hold ultimate business accountability for the quality of their domain’s data. Data stewards manage quality operationally day-to-day: monitoring scores, resolving incidents, and certifying assets. Data engineers build and maintain the technical infrastructure that measures and enforces quality standards. The CDO or governance lead sets the organization-wide quality standards and tracks program health.