What is Cardinality?
The broad definition of cardinality represents the number of elements in a set.
What is cardinality in a database
In a database context, cardinality refers to the number of unique values in a relational table column relative to the total number of rows in the table. The cardinality of a column is assessed and stored in system tables for optimizer use when the database administrator (DBA) runs statistics.
Why is it important in databases
The cardinality of a column is very important to database designers and the database query optimizer. For the designer or DBA, knowing a column is mainly repeating values tells them that it is a poor candidate for an index as it will not be very selective. For a cost-based query optimizer, the selectivity of a potential index dictates whether it will be used or ignored. Creating and maintaining indexes is expensive in terms of CPU and IO resource usage, so designers and developers need to ensure they create ones that will get used.
Types of cardinality in databases
Database designers map the degree of relationship between entities. An entity can have a one-to-many or one-to-one relationship with another entity. For example, one storage container may have one lid, making a one-to-one relationship. One doctor might have many patients forming a one-to-many relationship. This is known as relationship cardinality.
Data cardinality refers to the uniqueness of the values contained in a database column. If most of the values are distinct, then it is considered to have high cardinality. If the column contains mostly repeated values, that makes it a low cardinality column.
When partitioning a table based on ranges of data values, low cardinality can lead to data skew, resulting in uneven data distribution across partitions. This isn’t good because you want to balance resource usage across all the available processors, not just a subset.
High and low cardinality
A column that is populated with distinct values is known as a high cardinality column. A low number of distinct values in a column make it a low cardinality column. When selecting a column to index or use as a basis for a partitioning key, you are looking for high cardinality candidates. Similarly, a database query plan will use an available index if a column contains distinct values.
In terms of database performance tuning, a low cardinality column can result in a full table scan operation which is the most expensive (in terms of resource usage) way to query a table.
Cardinality and modality
When measuring the number of associations between two or more table columns or rows, we use the term cardinality. The focus is on the maximum number of associations. The modality focuses on the minimum number of relationships between entities or table rows. The modality of a relationship is 0 if the relationship is optional, while the modality is 1 if an occurrence of the relationship is mandatory.