Cardinality in Databases: Types and Their Significance
The broad definition of cardinality represents the number of elements in a set.
What is Cardinality in a Database?
In a database context, cardinality refers to the number of unique values in a relational table column relative to the total number of rows in the table. The cardinality of a column is assessed and stored in system tables for optimizer use when the database administrator (DBA) runs statistics.
Why is it Important?
The cardinality of a column is very important to database designers and the database query optimizer. For the designer or DBA, knowing a column is mainly repeating values tells them that it is a poor candidate for an index as it will not be very selective. For a cost-based query optimizer, the selectivity of a potential index dictates whether it will be used or ignored. Creating and maintaining indexes is expensive in terms of CPU and IO resource usage, so designers and developers need to ensure they create ones that will get used.
Types of Cardinality in Databases
Database designers map the degree of relationship between entities. An entity can have a one-to-many or one-to-one relationship with another entity. For example, one storage container may have one lid, making a one-to-one relationship. One doctor might have many patients forming a one-to-many relationship. This is known as relationship cardinality.
Data cardinality refers to the uniqueness of the values contained in a database column. If most of the values are distinct, then it is considered to have high cardinality. If the column contains mostly repeated values, that makes it a low cardinality column.
When partitioning a table based on ranges of data values, low cardinality can lead to data skew, resulting in uneven data distribution across partitions. This isn’t good because you want to balance resource usage across all the available processors, not just a subset.
High and Low Cardinality
A column that is populated with distinct values is known as a high cardinality column. A low number of distinct values in a column make it a low cardinality column. When selecting a column to index or use as a basis for a partitioning key, you are looking for high cardinality candidates. Similarly, a database query plan will use an available index if a column contains distinct values.
In terms of database performance tuning, a low cardinality column can result in a full table scan operation which is the most expensive (in terms of resource usage) way to query a table.
Cardinality and Modality
When measuring the number of associations between two or more table columns or rows, we use the term cardinality. The focus is on the maximum number of associations. The modality focuses on the minimum number of relationships between entities or table rows. The modality of a relationship is 0 if the relationship is optional, while the modality is 1 if an occurrence of the relationship is mandatory.
Actian and the Data Intelligence Platform
Actian Data Intelligence Platform is purpose-built to help organizations unify, manage, and understand their data across hybrid environments. It brings together metadata management, governance, lineage, quality monitoring, and automation in a single platform. This enables teams to see where data comes from, how it’s used, and whether it meets internal and external requirements.
Through its centralized interface, Actian supports real-time insight into data structures and flows, making it easier to apply policies, resolve issues, and collaborate across departments. The platform also helps connect data to business context, enabling teams to use data more effectively and responsibly. Actian’s platform is designed to scale with evolving data ecosystems, supporting consistent, intelligent, and secure data use across the enterprise. Request your personalized demo.