A star schema is a special form of a data model whose goal is not normalization, but optimization for efficient read operations. The main field of application is a data warehouse and OLAP applications. The term star schema comes from the fact that the tables are arranged in a star shape: at the centre is a fact table around which several dimension tables are grouped. A star scheme is usually denormalized. Possible anomalies and an increased memory requirement are accepted for performance reasons. An improvement is possible through the snowflake scheme related to the star scheme. There, however, multi-level dimension tables must be linked via join queries.
As a logical database schema for data warehouse applications, the so-called star schema has prevailed. This schema consists of a fact table and several dimension tables, which are arranged in a star shape around a fact table in a query-friendly manner and refer to exactly one fact table in this schema. The name of this schema is therefore derived from the star-shaped arrangement of the dimension tables around the central fact table. The mentioned fact table has information-carrying attributes, such as sales, time periods, costs, etc., and as a primary key a composite key from the primary keys of the participating dimension tables.
Each dimension table is in a 1:n relationship to a fact table. The 1:n relationship is mediated by a dimension table key and a foreign key of the fact table. The fact table implicitly integrates m:n relationships into a single table and therefore contains a lot of redundancy. The key of the fact table consists of the primary key of the respective dimension table as a foreign key.
---
The star scheme allows the selection, summary and navigation of the measured values or facts. The dimension tables are usually not normalized and are therefore denormalized: there are functional dependencies between non-key attributes so that the 3rd normal form (3NF) is violated. However, this breach is accepted with this scheme, because the data structure allows for better processing speed at the expense of data integrity and storage space.
The data to be managed is referred to as facts; they are typically stored continuously in the fact table. Other names for the facts are metrics, metrics, or metrics. Fact tables can become very large, forcing a data warehouse to gradually condense (aggregate) the data and finally delete or offload it after a hold period(archiving). The tables contain key figures or earnings figures that can be derived from current business and reflect economic performance, such as profitability, costs, performance/revenue, expenses, income, expenses, income, etc. However, it is only when these figures are put into context that they make sense. An example is that sales in a certain area are compared with defined products in a defined period of time, which reflects dimensions in which economic performance is evaluated and analyzed.
In contrast, the dimension table contains the “descriptive” data. The fact table contains foreign keys to the dimension entries that define their meaning. Typically, the total set of foreign keys on the dimension tables also represents the primary key in the fact table. This implies that each entry for a combination of dimensions can only exist once. Dimension tables are comparatively static and usually much smaller than fact tables. The term “dimension” comes from the fact that each dimension table represents a dimension of a multidimensional OLAP cube.
Due to the existence of functional dependencies between non-key attributes, the third normal form is deliberately violated in the dimension tables. In order to meet the 3NF, the dimension table in question would have to be broken down into individual hierarchical tables, but for reasons of performance, the star schema refrains from normalizing the dimension tables and accepts the redundancy that occurs as a result.
The advantage of separating facts and dimensions is that the facts can be analyzed generically and independently after each dimension. An OLAP application does not require “knowledge” of the meaning of a dimension. The interpretation is left solely to the user.
However, the size of dimension tables should be taken into account. Fact tables can often contain more than 10 million records in a star schema. Although dimension tables are smaller, they can be significant for individual dimensions. In order to reduce such large data sets and the associated shortened access times, individual, very large dimension tables can be converted into a snowflake scheme by normalization.
One problem with the star schema is that data in the dimension tables is related to data in the fact tables over a long period of time. Over time, however, changes to the dimension data may also be necessary. However, these changes must generally not affect data before the change. For example, if the seller for a product group changes, the respective entry in the dimension table must not simply be overwritten. Instead, a new entry must be generated, otherwise, the sales figures of the previous seller would no longer be detectable. One concept for avoiding such conflicts is Slowly Changing Dimensions. This concept summarizes methods in data warehousing in order to record changes in dimension tables and, if necessary, to document them historically.