Star Schema

A Star Schema is a database designed for analytics-oriented applications such as data warehouses and dimensional data marts. The design features one or more fact tables containing measurable data related to several dimension tables that hold descriptive data.
Why is Star Schema Design Important?
Decision-support database applications require a flexible design that supports the most common queries. Traditional normalized schemas require more table joins, which use more resources and are slower. By storing all the measurable facts about an entity in a single table, most of the attributes required by an analytic query can come from a single table with optional joins to dimensional tables. Efficiency is gained by applying filtering predicates on the fact table so only a subset of data is involved in any joins, consequently consuming less CPU and memory resources.
If the business uses Online analytics processing (OLAP) cubes, the star schema maps effectively to queries that populate the cube.
The Origins of the Star Schema
Early data warehouses were simply copies of transactional database schemas with additional indexes to support analytic queries. As data warehouse architects such as Ralph Kimball developed the idea of dimensional schema in the 1990s to simplify analytic queries.
Star Schema Example
As a retailer, you would be interested in what products are selling, who is buying them, and when. To answer these questions, a fact table that links to the dimensions of products, orders, and customers is required. The following describes some typical attributes of these tables:
- FACT Tabe:
- Product ID – links to the Product Dimension table
- Order ID – links to the Dimension table
- Customer ID – links to Customer table
- Total Sales – Measure
- Units Sold – Measure
- DIMENSION Table Orders contains:
- Product ID
- Order details
- DIMENSION Table Customers contains:
- Customer ID
- Customer details
The Disadvantage of the Star Schema vs. Third Normal Form
A fully normalized database schema design has fewer duplicated fields than a Star Schema but tends to require more complex queries because of an increased number of tables. A Third Normal Form database schema requires less storage space because it has fewer redundancies. The Star Schema often takes longer to load with data use due to more data duplication and additional index columns required for foreign keys.
The Advantage of Using a Columnar Database With a Star Schema
A columnar database stores the records of a relational table as columns, unlike a traditional relational database system, which stores a table row as a single record. In the case of a Star Schema, the fact table can get pretty broad as the number and length of attributes about the entity grow. Most queries only need a handful of attribute fields from the fact table. The columnar approach only accesses those few columns, unlike a row-based database that requires the whole row to be loaded into memory before it can be scanned for necessary column values. Because the entire row is usually considerably longer than the selected columns, the memory footprint is far larger, typically resulting in slower queries. The columnar database has a built-in primary key index, which is commonly the sort order of the table, saving more precious cache and RAM storage. The bottom line is that the columnar database structure nicely complements the Star Schema design.
Actian Vector Provides Schema Flexibility
The Vector columnar database can be used to deploy a star schema model for benchmark workloads such as the TPC-D. However, this is not a requirement, as you can even choose to use a denormalized schema with all columns in a single wide table. Combining column-based access and min-max indexes can make queries against a single large denormalized table highly efficient for some workloads. Vector will still run queries faster than traditional databases that use a row store, whether you choose star schema, snowflake schema, normalized or denormalized schema.