Once we have identified that we are going to use a relational database it is important to understand how to organise the data in order for the database to be efficient and contain high fidelity data.
Data Fidelity refers to how correct the data is within the database. For example, if you were to contact a company and explain that they had stored your surname with incorrect spelling and their database saved your surname in multiple locations, they would need to find all places where your surname was saved and update them.
By saving the same data in multiple locations it is easy to miss out an update, therefore, leaving a record incorrect. Having a database with high data fidelity means that it is difficult to have incorrect data in this way – by updating one record the update cascades to all linked records.
The way in which we ensure that our databases have as little duplicate data as possible is by using normalization which is identified using 3 levels.
1NF – First Normal Form
Databases in first normal form must follow certain rules . Firstly comma the data must be atomic. Atomic data is data that has been divided down into its smallest meaningful parts. In the example below, our database which is a normalised or in 0NF holds the name of our students in one field. Putting this into first normal form means breaking down the data into 3 separate fields.
The second rule of first normal form is that each record needs to be unique. Often, we create a unique record by adding a primary key. A primary key is described as a unique identifier for a record within a table.
Primary keys can be made in a number of different ways including a simple integer that increments for each record, a composite key it is made by combining two fields within the record, or a GUID or generated unique identifier which combines data about the record and randomly generated characters.
2NF – Second Normal Form
Before attempting to put a database into second normal form, the database must meet all of the rules of 1st normal form.
The second rule of second normal form is not that we don’t talk about second normal form, but that the primary key must be a single field. This means that composite keys cannot be part of second normal form as they are made from two separate fields.
By having the primary key as a single field within the record this allows us to implement links between the tables by adding foreign keys to other tables. A foreign key is a primary key from one table that appears in another in order to link the two records together.
This often means that we try to remove the duplication of data creating other tables that hold foreign keys to link the data together. In the example below, we have divided the student and their classes into 2 separate tables and linked them together using a foreign key.
3NF – Third Normal Form
As was the case with second normal form, in order to put a database into third normal form it must already meet the rules of second normal form.
A useful phrase to remember for third normal form is “everything must depend upon the key and nothing but the key“.
What this means is that there must be no functional dependencies within our database. A functional dependency is when non-key column might cause another column to change.
Often, in the exam, we will be presented with an entity-relationship diagram that shows a many to many relationship and we are asked to place this into third normal form. Understanding that in a one to many relationship the primary key is in the table where the one relation starts, and the foreign key is in the table where the many relation starts, helps us to visualise how the data may be laid out to prevent unnecessary duplication of data. In fact, in third normal form the only duplication of data should be foreign keys.