Normalization is a series of steps followed to obtain a database design that allows for efficient access and storage of data. These steps reduce data redundancy and the chances of data becoming inconsistent.
First Normal Form
First Normal Form eliminates repeating groups by putting each into a separate table and connecting them with a one-to-many relationship.
Two rules follow this definition:
- Each table has a primary key made of one or several fields and uniquely identifying each record
- Each field is atomic, it does not contain more than one value
For instance, assuming a table WAGON to follow each wagon in every station.
wagon_id integer (ex. 101) description string (ex. 'wagon_type, empty_weight, capacity, designer, design_date') state string (ex. 'under repair') timestamp datetime (ex. '22/12/2008 17:37') station string (ex. 'New York Grand Central')
The primary key is (wagon_id, timestamp).
This table is not in 1NF because "description" is not atomic. To move it in 1NF we have to split "description" field in its components:
wagon_id integer wagon_type string empty_weight number capacity number designer string design_date datetime state string timestamp datetime station string
Second Normal Form
Second Normal Form eliminates functional dependencies on a partial key by putting the fields in a separate table from those that are dependent on the whole key.
In our example, "wagon_type", "empty_weight", "capacity"... only depends on "wagon_id" but not on "timestamp" field of the primary key, so this table is not in 2NF. In order to reach 2NF, we have to split the table in two in the way that each field of each table depends on all the fields of each primary key:
wagon_id integer wagon_type string empty_weight number capacity number designer string design_date datetime
wagon_id integer timestamp datetime state string station string
Third Normal Form
Third Normal Form eliminates functional dependencies on non-key fields by putting them in a separate table. At this stage, all non-key fields are dependent on the key, the whole key and nothing but the key.
In our example, in the first table it is most likely that "empty_weight", "capacity", "designer" and "design_date" depend on "wagon_type", so we have to split this table in two:
wagon_id integer wagon_type string
wagon_type string empty_weight number capacity number designer string design_date datetime
Now our example with its 3 tables is in 3NF.
Fourth Normal Form
Fourth Normal Form separates independent multi-valued facts stored in one table into separate tables.
In the last table of our example, it is clear that "empty_weight" and "capacity" are interesting for every day wagon management whereas "designer" and "design_date" for historical purpose. So we can think they will never be used at the same time and have to split the table in two to fit the 4NF.
wagon_type string empty_weight number capacity number
wagon_type string designer string design_date datetime
Fifth Normal Form
Fifth Normal Form breaks out data redundancy that is not covered by any of the previous normal forms.
- Denormalize - the reverse process
- Examples in an article by Barry Wise: http://www.barrywise.com/2008/01/database-normalization-and-design-techniques/
|Glossary of Terms|