Star Schema

Two colleagues discussing data on a tablet in an office, illustrating the use of a Star Schema in database design.

A Star Schema is a database designed for analytics-oriented applications such as data warehouses and dimensional data marts. The design features one or more fact tables containing measurable data related to several dimension tables that hold descriptive data.

Why is Star Schema Design Important?

Decision-support database applications require a flexible design that supports the most common queries. Traditional normalized schemas require more table joins, which use more resources and are slower. By storing all the measurable facts about an entity in a single table, most of the attributes required by an analytic query can come from a single table with optional joins to dimensional tables. Efficiency is gained by applying filtering predicates on the fact table so only a subset of data is involved in any joins, consequently consuming less CPU and memory resources.

If the business uses Online analytics processing (OLAP) cubes, the star schema maps effectively to queries that populate the cube.

The Origins of the Star Schema

Early data warehouses were simply copies of transactional database schemas with additional indexes to support analytic queries. As data warehouse architects such as Ralph Kimball developed the idea of the dimensional schema in the 1990s to simplify analytic queries.

Star Schema Example

As a retailer, you would be interested in what products are selling, who is buying them, and when. To answer these questions, a fact table that links to the dimensions of products, orders, and customers is required. The following describes some typical attributes of these tables:

  • FACT Tabe:
    • Product ID – links to the Product Dimension table
    • Order ID – links to the Dimension table
    • Customer ID – links to Customer table
    • Total Sales – Measure
    • Units Sold – Measure
  • DIMENSION Table Orders contains:
    • Product ID
    • Order details
  • DIMENSION Table Customers contains:
    • Customer ID
    • Customer details

The Disadvantage of the Star Schema vs. Third Normal Form

A fully normalized database schema design has fewer duplicated fields than a Star Schema but tends to require more complex queries because of an increased number of tables. A Third Normal Form database schema requires less storage space because it has fewer redundancies. The Star Schema often takes longer to load with data use due to more data duplication and additional index columns required for foreign keys.

The Advantage of Using a Columnar Database With a Star Schema

A columnar database stores the records of a relational table as columns, unlike a traditional relational database system, which stores a table row as a single record. In the case of a Star Schema, the fact table can get pretty broad as the number and length of attributes about the entity grow. Most queries only need a handful of attribute fields from the fact table. The columnar approach only accesses those few columns, unlike a row-based database that requires the whole row to be loaded into memory before it can be scanned for necessary column values. Because the entire row is usually considerably longer than the selected columns, the memory footprint is far larger, typically resulting in slower queries. The columnar database has a built-in primary key index, which is commonly the sort order of the table, saving more precious cache and RAM storage. The bottom line is that the columnar database structure nicely complements the Star Schema design.

Actian Vector Provides Schema Flexibility

The Vector columnar database can be used to deploy a star schema model for benchmark workloads such as the TPC-D. However, this is not a requirement, as you can even choose to use a denormalized schema with all columns in a single wide table. Combining column-based access and min-max indexes can make queries against a single large denormalized table highly efficient for some workloads. Vector will still run queries faster than traditional databases that use a row store, whether you choose a star schema, snowflake schema, a normalized, or a denormalized schema.

FAQ

A star schema is a database design for analytics applications featuring one or more fact tables containing measurable data surrounded by dimension tables that hold descriptive data, creating a star-like structure.

The main components are fact tables that store measurable data (like total sales and units sold) and dimension tables that provide descriptive context (like product details, customer information, and order data), linked together through foreign keys.

A star schema has more duplicated fields and requires more storage space but enables simpler, faster queries, while third normal form is fully normalized with fewer redundancies but requires more complex queries due to increased table joins.

Star schema design supports flexible, efficient analytic queries by storing all measurable facts in a single table, reducing the number of table joins required and consuming less CPU and memory resources compared to traditional normalized schemas.

A retail star schema might include a fact table tracking total sales and units sold, linked to dimension tables for products (product details), orders (order details), and customers (customer details) to analyze what products are selling, who is buying them, and when.

Data warehouse architect Ralph Kimball developed the dimensional schema concept in the 1990s to simplify analytic queries.

Columnar databases complement star schemas by accessing only the needed columns rather than loading entire rows into memory, resulting in a smaller memory footprint, faster queries, and more efficient use of cache and RAM storage.

No, Actian Vector provides schema flexibility and can efficiently handle star schema, snowflake schema, normalized, or even fully denormalized single-table designs while still delivering faster query performance than traditional row-store databases.