13.10. Normalization Old#
What is database normalization?
The purpose of database normalization is to:
eliminate redundant data
reduce complexity of data, making it easier to manage the data and make change
ensure logical data dependencies
How is database normalization achieved?
By fulfilling five normal forms. Each normal form represents an increasingly stringent set of rules. Usually fulfilling the first three normal forms is sufficient.
Ref: https://www.1keydata.com/database-normalization/first-normal-form-1nf.php
First Normal Form (1NF):
if there are no repeating groups.
all values are atomic, meaning they are the smallest meaningful value
Second Normal Form (2NF):
the table is in first normal form
each non-key field is functionally dependent on the entire primary key
Third Normal Form (3NF):
the table is in second normal form
there are no transitive dependencies
Ref: https://arctype.com/blog/2nf-3nf-normalization-example/
Summary
All values must be atomic
No redundancy
No implicit relationship/dependency
No transitive relationship/dependency
13.10.1. Bad Design Examples#
13.10.1.1. Example 1#
Show code cell source
import sqlite3
import pandas as pd
conn = sqlite3.connect('example1.db')
sql_statement = "SELECT * FROM EMPLOYEES_PROJECTS_TIME"
df = pd.read_sql_query(sql_statement, conn)
df.style.set_table_attributes('style="font-size: 12px"')
EmployeeID | Name | Project | Time | |
---|---|---|---|---|
0 | EN1-26 | Sean O'Brien | 30-452-T3, 30-457-T3, 32-244-T3 | 0.25, 0.40, 0.30 |
1 | EN1-33 | Amy Guya | 30-452-T3, 30-382-TC, 32-244-T3 | 0.05, 0.35, 0.60 |
2 | EN1-35 | Steven Baranco | 30-452-T3, 31-238-TC | 0.15, 0.80 |
3 | EN1-36 | Elizabeth Roslyn | 35-152-TC | 0.90 |
4 | EN1-38 | Carol Schaaf | 36-272-TC | 0.75 |
5 | EN1-40 | Alexandra Wing | 31-238-TC, 31-241-TC | 0.20, 0.70 |
Problems with example1
Repeating group of fields
The project and time fields are not made up of atomic values
Can’t sort by last name
Can’t sort by time because field is type text
Assumed relationship between project and time
13.10.1.2. Example 2#
Show code cell source
import sqlite3
import pandas as pd
conn = sqlite3.connect('example2.db')
sql_statement = "SELECT * FROM EMPLOYEES_PROJECTS_TIME"
df = pd.read_sql_query(sql_statement, conn)
df.style.set_table_attributes('style="font-size: 12px"')
EmployeeID | Last_Name | First_Name | Project1 | Time1 | Project2 | Time2 | Project3 | Time3 | |
---|---|---|---|---|---|---|---|---|---|
0 | EN1-26 | O'Brien | Sean | 30-452-T3 | 0.250000 | 30-457-T3 | 0.400000 | 32-244-T3 | 0.300000 |
1 | EN1-33 | Guya | Amy | 30-452-T3 | 0.050000 | 30-382-TC | 0.350000 | 32-244-T3 | 0.600000 |
2 | EN1-35 | Baranco | Steven | 30-452-T3 | 0.150000 | 31-238-TC | 0.800000 | None | nan |
3 | EN1-36 | Roslyn | Elizabeth | 35-152-TC | 0.900000 | None | nan | None | nan |
4 | EN1-38 | Schaaf | Carol | 36-272-TC | 0.750000 | None | nan | None | nan |
5 | EN1-40 | Wing | Alexandra | 31-238-TC | 0.200000 | 31-241-TC | 0.700000 | None | nan |
Analysis of example2
Can sort now!
How can you add another project?
13.10.1.3. Example 3#
Show code cell source
import sqlite3
import pandas as pd
conn = sqlite3.connect('example3.db')
sql_statement = "SELECT * FROM EMPLOYEES"
df = pd.read_sql_query(sql_statement, conn)
df.style.set_table_attributes('style="font-size: 12px"')
display(df)
sql_statement = "SELECT * FROM PROJECTS_EMPLOYEES_TIME"
df = pd.read_sql_query(sql_statement, conn)
df.style.set_table_attributes('style="font-size: 12px"')
EmployeeID | Last_Name | First_Name | |
---|---|---|---|
0 | EN1-26 | O'Brien | Sean |
1 | EN1-33 | Guya | Amy |
2 | EN1-35 | Baranco | Steven |
3 | EN1-36 | Roslyn | Elizabeth |
4 | EN1-38 | Schaaf | Carol |
5 | EN1-40 | Wing | Alexandra |
ProjectNum | EmployeeID | Time | |
---|---|---|---|
0 | 30-328-TC | EN1-33 | 0.350000 |
1 | 30-452-T3 | EN1-26 | 0.250000 |
2 | 30-452-T3 | EN1-33 | 0.050000 |
3 | 30-452-T3 | EN1-35 | 0.150000 |
4 | 31-238-TC | EN1-35 | 0.800000 |
5 | 30-457-T3 | EN1-26 | 0.400000 |
6 | 31-238-TC | EN1-40 | 0.200000 |
7 | 31-241-TC | EN1-40 | 0.700000 |
8 | 32-244-T3 | EN1-33 | 0.600000 |
9 | 35-152-TC | EN1-36 | 0.900000 |
10 | 36-272-TC | EN1-38 | 0.750000 |
Analysis of example3 – first normal form
Can do groups by employeeid or projectnum
Can sort by time
Can sort by name
13.10.1.4. Example 4#
Show code cell source
import sqlite3
import pandas as pd
conn = sqlite3.connect('example4.db')
sql_statement = "SELECT * FROM EMPLOYEES_PROJECTS"
df = pd.read_sql_query(sql_statement, conn)
df.style.set_table_attributes('style="font-size: 12px"')
EmployeeID | Last_Name | First_Name | ProjectNumber | ProjectTitle | |
---|---|---|---|---|---|
0 | EN1-26 | O'Brien | Sean | 30-452-T3 | STAR manual |
1 | EN1-26 | O'Brien | Sean | 30-457-T3 | ISO procedures |
2 | EN1-26 | O'Brien | Sean | 31-124-T3 | Employee handbook |
3 | EN1-33 | Guya | Amy | 30-452-T3 | STAR manual |
4 | EN1-33 | Guya | Amy | 30-482-TC | Web Site |
5 | EN1-33 | Guya | Amy | 31-241-TC | New catalog |
6 | EN1-35 | Baranco | Steven | 30-452-T3 | STAR manual |
7 | EN1-35 | Baranco | Steven | 31-238-TC | STAR prototype |
8 | EN1-36 | Roslyn | Elizabeth | 35-152-TC | STAR pricing |
9 | EN1-38 | Schaaf | Carol | 36-272-TC | Order system |
10 | EN1-40 | Wing | Alexandra | 31-238-TC | STAR prototype |
11 | EN1-40 | Wing | Alexandra | 31-241-TC | New catalog |
Analysis of example4
How would you update the project title for a given project? Have to edit in many places
Can you add a project without an employeeid?
How can you delete a project?
13.10.1.5. Example 5#
Show code cell source
import sqlite3
import pandas as pd
conn = sqlite3.connect('example5.db')
sql_statement = "SELECT * FROM EMPLOYEES"
df = pd.read_sql_query(sql_statement, conn)
df.style.set_table_attributes('style="font-size: 12px"')
display(df)
sql_statement = "SELECT * FROM EMPLOYEES_PROJECTS"
df = pd.read_sql_query(sql_statement, conn)
df.style.set_table_attributes('style="font-size: 12px"')
display(df)
sql_statement = "SELECT * FROM PROJECTS"
df = pd.read_sql_query(sql_statement, conn)
df.style.set_table_attributes('style="font-size: 12px"')
EmployeeID | Last_Name | First_Name | |
---|---|---|---|
0 | EN1-26 | O'Brien | Sean |
1 | EN1-33 | Guya | Amy |
2 | EN1-35 | Baranco | Steven |
3 | EN1-36 | Roslyn | Elizabeth |
4 | EN1-38 | Schaaf | Carol |
5 | EN1-40 | Wing | Alexandra |
EmployeeID | ProjectNum | |
---|---|---|
0 | EN1-26 | 30-452-T3 |
1 | EN1-26 | 30-457-T3 |
2 | EN1-26 | 31-124-T3 |
3 | EN1-33 | 30-328-TC |
4 | EN1-33 | 30-452-T3 |
5 | EN1-33 | 32-244-T3 |
6 | EN1-35 | 30-452-T3 |
7 | EN1-35 | 31-238-TC |
8 | EN1-36 | 35-152-TC |
9 | EN1-38 | 36-272-TC |
10 | EN1-40 | 31-238-TC2 |
11 | EN1-40 | 31-241-TC |
ProjectNum | ProjectTitle | |
---|---|---|
0 | 30-452-T3 | STAR manual |
1 | 30-457-T3 | ISO procedures |
2 | 30-482-TC | Web site |
3 | 31-124-T3 | Employee handbook |
4 | 31-238-TC | STAR prototype |
5 | 31-238-TC2 | New catalog |
6 | 35-152-TC | STAR pricing |
7 | 36-272-TC | Order system |
Analysis of example5
second normal form
13.10.1.6. Example 6#
Show code cell source
import sqlite3
import pandas as pd
conn = sqlite3.connect('example6.db')
sql_statement = "SELECT * FROM PROJECTS"
df = pd.read_sql_query(sql_statement, conn)
df.style.set_table_attributes('style="font-size: 12px"')
ProjectNum | ProjectTitle | ProjectMgr | Phone | |
---|---|---|---|---|
0 | 30-452-T3 | STAR manual | Garrison | 2756 |
1 | 30-457-T3 | ISO procedures | Jacanda | 2954 |
2 | 30-482-TC | Web site | Friedman | 2846 |
3 | 31-124-T3 | Employee handbook | Jones | 3102 |
4 | 31-238-TC | STAR prototype | Garrison | 2756 |
5 | 31-241-TC | New catalog | Jones | 3102 |
6 | 35-152-TC | STAR pricing | Vance | 3022 |
7 | 36-272-TC | Order system | Jacanda | 2954 |
Analysis of example 6
Phone number, which is a non-key field, has transitive dependency on another non-key field.
13.10.1.7. Example 6#
Show code cell source
import sqlite3
import pandas as pd
conn = sqlite3.connect('example7.db')
sql_statement = "SELECT * FROM MANAGERS"
df = pd.read_sql_query(sql_statement, conn)
df.style.set_table_attributes('style="font-size: 12px"')
display(df)
conn = sqlite3.connect('example7.db')
sql_statement = "SELECT * FROM PROJECTS"
df = pd.read_sql_query(sql_statement, conn)
df.style.set_table_attributes('style="font-size: 12px"')
ProjectMgr | Phone | |
---|---|---|
0 | Friedman | 2846 |
1 | Garrison | 2756 |
2 | Jacanda | 2954 |
3 | Jones | 3102 |
4 | Vance | 3022 |
ProjectNum | ProjectTitle | ProjectMgr | |
---|---|---|---|
0 | 30-452-T3 | STAR manual | Garrison |
1 | 30-457-T3 | ISO procedures | Jacanda |
2 | 30-482-TC | Web site | Friedman |
3 | 31-124-T3 | Employee handbook | Jones |
4 | 31-238-TC | STAR prototype | Garrison |
5 | 31-241-TC | New catalog | Jones |
6 | 35-152-TC | STAR pricing | Vance |
7 | 36-272-TC | Order system | Jacanda |
Analysis of example7
Removed transitive dependency