Normalization of a relational data schema (table structure) is the division of attributes (table columns) into several relations (tables) according to the normalization rules (see below) so that a form is created that no longer contains redundancies.
A conceptual schema that contains data redundancies can result in changes to the database; it realizes not to change the multiple data contained units, but only partially and incompletely, making it obsolete or contradictory. One can speak of occurring anomalies. In addition, multiple storages of the same data take up unnecessary storage space. To prevent redundancy, such tables are normalized.
There are several degrees to which a database schema can be immune to anomalies. Depending on this, it is said that it is in the first, second, third, etc. normal form. These normal forms are defined by certain formal requirements for the schema. One brings a relational data schema into a normal form by progressively decomposing its relations into simpler ones based on their functional dependencies until no further decomposition is possible. However, data must not be lost under any circumstances. With Delobel’s theorem, one can formally prove for a decomposition step that it does not entail any data loss.
---
Normal forms fit into each other, so much so that respect for a normal form of the higher level implies respect for the normal forms of the lower levels. In the OLTP relational model, there are eight normal forms, the first three being the best known and used:
- the first normal form denoted 1NF
- the second normal form denoted 2NF
- the third normal form denoted 3NF
- the normal form of Boyce Codd denoted BCNF
- the fourth normal form denoted 4NF
- the fifth normal form denoted 5NF
- the normal form key domain denoted DKNF
- the sixth normal form denoted 6NF
The normal form comes after the simple validity of a relational model, that is, the values of the different attributes are functionally dependent on the primary key (completely determined by the primary key). Normalizing is mainly done in the phase of designing a relational database. For normalization, there are algorithms (synthesis algorithm (3NF), decomposition algorithm (BCNF), etc.) that can be automated. The decomposition methodology follows the relational design theory.
An example: A database contains names of the customers and their addresses as well as orders that are assigned to the customers. Since there can be several orders from the same customer, a recording of the customer data (possibly with address data) in the order table would lead to them occurring there several times, although the customer always has only one set of valid data (redundancy). For example, incorrect address data for the customer may be entered in one order, and the correct data is recorded in the next order. This can lead to contradictory data – in this table or compared to other tables. The data would then not be consistent, you would not know which data is correct. Maybe even both addresses are not correct because the customer has moved.
With a normalized database, there is only one entry for the customer data in the customer table to which each order of this customer is linked (usually via the customer number). In the case of the relocation of a customer (another example is the change of VAT), there would be several entries in the corresponding table, but these are additionally distinguishable by specifying a validity period and can be addressed in the above customer example via the combination order date/customer number.
Weaknesses in the data model due to a lack of normalization can – in addition to the typical anomalies – mean a higher effort for later further development. On the other hand, normalization steps can be deliberately omitted in the database design for performance considerations (denormalization). A typical example of this is the star schema in the data warehouse.
The creation of a normalized schema is supported by automatic derivation from a conceptual data model; in practice, an extended entity-relationship model (ERM) or a class diagram of the Unified Modeling Language (UML) serves as a starting point. The relational schema derived from the conceptual design can then be checked with the help of normalizations; however, some formalisms and algorithms can already ensure this property.