Normalization

13.9. Normalization#

Database normalization serves several important purposes. The key objectives and benefits are:

Eliminate Data Redundancy
- Without normalization: The same doctor’s information is repeated for every patient
- With normalization: Doctor information is stored once in a Doctors table
- Benefits: Saves storage space and ensures consistency
Prevent Update Anomalies
- Without normalization: Updating a doctor’s phone number requires changing multiple patient records
- With normalization: Update one record in the Doctors table
- Benefits: Reduces errors and maintains data integrity
Prevent Insert Anomalies
- Without normalization: Can’t add a new doctor without a patient
- With normalization: Can add doctors independently in their own table
- Benefits: Data can be added more flexibly and logically
Prevent Delete Anomalies
- Without normalization: Deleting the last patient of a doctor would lose the doctor’s information
- With normalization: Doctor information remains even if all their patients are deleted
- Benefits: Preserves important data
Ensure Data Consistency
- Without normalization: Same data might be entered differently in different places
- With normalization: Each piece of data has one authoritative source
- Benefits: Maintains data quality and reliability
Better Data Organization
- Without normalization: Data is mixed together (like medical history with insurance info)
- With normalization: Data is organized by logical entities (patients, doctors, insurance, etc.)
- Benefits: Easier to understand and maintain
More Flexible Querying
- Without normalization: Complex queries might be difficult or impossible
- With normalization: Can join tables in various ways to get needed information
- Benefits: Better reporting and analysis capabilities
Easier Maintenance
- Without normalization: Changes to data structure affect many parts of the database
- With normalization: Changes are localized to specific tables
- Benefits: Reduces maintenance effort and risk
Better Data Integrity
- Without normalization: Difficult to enforce relationships and constraints
- With normalization: Can use foreign keys and constraints effectively
- Benefits: Ensures data accuracy and reliability
Reduced Data Modification Issues
- Without normalization: Changes could have unintended consequences
- With normalization: Changes are isolated and controlled
- Benefits: Safer database updates

In our patient database example:

# Before normalization:
Patients (
    PatientID,
    PatientName,
    DoctorName,      # Repeated for many patients
    DoctorPhone,     # Repeated for many patients
    DoctorSpecialty  # Repeated for many patients
)

# After normalization:
Patients (
    PatientID,
    PatientName,
    DoctorID         # Reference to Doctors table
)

Doctors (
    DoctorID,
    DoctorName,      # Stored once
    DoctorPhone,     # Stored once
    DoctorSpecialty  # Stored once
)

The trade-offs of normalization include:

More complex queries requiring joins
Potentially slower query performance for some operations
More tables to manage
More complex relationships to maintain

However, these drawbacks are usually outweighed by the benefits in most business applications where data integrity is crucial, like healthcare systems.

13.9.1. The Five Normal Forms#

The five normal forms of database normalization, progressing from 1NF to 5NF, with clear examples from our patient database context.

First Normal Form (1NF)

Rules:
- Each cell contains a single value (atomic)
- No repeating groups or arrays
- Each record is unique

Example:

# Before 1NF
Patient {
    PatientID: 1,
    Medications: "Aspirin, Ibuprofen, Paracetamol",
    Allergies: ["Penicillin", "Peanuts"]
}

# After 1NF
PatientMedications {
    PatientID: 1,
    MedicationName: "Aspirin"
}
PatientMedications {
    PatientID: 1,
    MedicationName: "Ibuprofen"
}

Second Normal Form (2NF)

Rules:
- Must be in 1NF
- No partial dependencies (non-key attributes must depend on entire primary key)

Example:

# Before 2NF
PatientMedication {
    PatientID + MedicationID (composite key),
    MedicationName,        # Depends only on MedicationID
    PatientName,           # Depends only on PatientID
    Dosage                 # Depends on both (OK)
}

# After 2NF
Medications {
    MedicationID,
    MedicationName
}
PatientMedications {
    PatientID,
    MedicationID,
    Dosage
}

Third Normal Form (3NF)

Rules:
- Must be in 2NF
- No transitive dependencies (non-key attributes can’t depend on other non-key attributes)

Example:

# Before 3NF
Patient {
    PatientID,
    Address,
    City,            # Depends on Address, not PatientID
    State,           # Depends on Address, not PatientID
    ZipCode          # Depends on Address, not PatientID
}

# After 3NF
Patient {
    PatientID,
    AddressID
}
Address {
    AddressID,
    Address,
    City,
    State,
    ZipCode
}

Boyce-Codd Normal Form (BCNF)

Rules:
- Must be in 3NF
- For every dependency A → B, A must be a superkey

Example:

# Before BCNF
DoctorSchedule {
    Doctor,
    Specialty,
    TimeSlot,
    Room        # Room depends on TimeSlot, but TimeSlot isn't a key
}

# After BCNF
DoctorSpecialty {
    Doctor,
    Specialty
}
RoomSchedule {
    TimeSlot,
    Room
}
DoctorAssignment {
    Doctor,
    TimeSlot
}

Fifth Normal Form (5NF)

Rules:
- Must be in BCNF
- No join dependencies that don’t follow from the key constraints

Example:

# Before 5NF
DoctorPatientInsurance {
    DoctorID,
    PatientID,
    InsuranceID
}

# After 5NF
DoctorPatient {
    DoctorID,
    PatientID
}
PatientInsurance {
    PatientID,
    InsuranceID
}
DoctorInsurance {
    DoctorID,
    InsuranceID
}

Key Points:

Each form builds on the previous one
Higher normal forms reduce redundancy but increase complexity
Most real-world applications aim for 3NF
BCNF and 5NF are rarely implemented fully
Sometimes denormalization is done for performance

Common Stopping Points:

Most systems stop at 3NF
Healthcare systems often aim for BCNF due to data integrity requirements
5NF is rarely implemented in practice
Sometimes partial denormalization is done for performance reasons

Practical Considerations:

Each higher form:
- Reduces redundancy
- Increases number of tables
- Makes queries more complex
- Requires more joins
Trade-offs between:
- Data integrity
- Query performance
- System complexity
- Maintenance effort

13.9.2. Patient Database Normalization Example#

13.9.2.1. Non-Normalized Form#

CREATE TABLE Patients (
    PatientID INT,
    PatientName TEXT,
    DateOfBirth DATE,
    Address TEXT,
    Phone TEXT,
    PrimaryDoctor TEXT,
    DoctorPhone TEXT,
    DoctorSpecialty TEXT,
    Appointments TEXT,  -- Contains multiple appointment dates and times
    Medications TEXT,   -- Contains multiple medication names and dosages
    AllergiesList TEXT, -- Contains multiple allergies
    EmergencyContact TEXT,
    EmergencyPhone TEXT,
    EmergencyRelation TEXT,
    InsuranceProvider TEXT,
    InsurancePolicyNumber TEXT,
    InsuranceGroupNumber TEXT
);

13.9.2.1.1. Issues with Non-Normalized Form:#

Data redundancy: Doctor information repeated for each patient with the same doctor
Multiple values in single columns (Appointments, Medications, Allergies)
Update anomalies: Changing a doctor’s phone requires updating multiple rows
Delete anomalies: Deleting a patient could lose doctor information
Insert anomalies: Cannot add a new doctor without a patient

13.9.2.2. First Normal Form (1NF)#

CREATE TABLE Patients (
    PatientID INT PRIMARY KEY,
    PatientName TEXT,
    DateOfBirth DATE,
    Address TEXT,
    Phone TEXT,
    PrimaryDoctorID INT,
    EmergencyContactID INT,
    InsuranceID INT
);

CREATE TABLE Doctors (
    DoctorID INT PRIMARY KEY,
    DoctorName TEXT,
    Phone TEXT,
    Specialty TEXT
);

CREATE TABLE EmergencyContacts (
    ContactID INT PRIMARY KEY,
    ContactName TEXT,
    Phone TEXT,
    Relation TEXT
);

CREATE TABLE Insurance (
    InsuranceID INT PRIMARY KEY,
    Provider TEXT,
    PolicyNumber TEXT,
    GroupNumber TEXT
);

CREATE TABLE PatientAppointments (
    AppointmentID INT PRIMARY KEY,
    PatientID INT,
    AppointmentDateTime DATETIME,
    Purpose TEXT
);

CREATE TABLE PatientMedications (
    MedicationID INT PRIMARY KEY,
    PatientID INT,
    MedicationName TEXT,
    Dosage TEXT,
    Frequency TEXT
);

CREATE TABLE PatientAllergies (
    AllergyID INT PRIMARY KEY,
    PatientID INT,
    Allergy TEXT,
    Severity TEXT
);

13.9.2.2.1. Benefits of 1NF:#

No repeating groups or arrays
Each cell contains a single value
Each record has a unique identifier
Data is atomic (cannot be broken down further)

13.9.2.3. Second Normal Form (2NF)#

-- Previous tables remain the same, but add:

CREATE TABLE Medications (
    MedicationID INT PRIMARY KEY,
    MedicationName TEXT,
    Description TEXT,
    StandardDosage TEXT
);

CREATE TABLE PatientMedications (
    PrescriptionID INT PRIMARY KEY,
    PatientID INT,
    MedicationID INT,
    Dosage TEXT,
    Frequency TEXT,
    FOREIGN KEY (PatientID) REFERENCES Patients(PatientID),
    FOREIGN KEY (MedicationID) REFERENCES Medications(MedicationID)
);

CREATE TABLE Allergies (
    AllergyID INT PRIMARY KEY,
    AllergyName TEXT,
    Description TEXT
);

CREATE TABLE PatientAllergies (
    PatientAllergyID INT PRIMARY KEY,
    PatientID INT,
    AllergyID INT,
    Severity TEXT,
    FOREIGN KEY (PatientID) REFERENCES Patients(PatientID),
    FOREIGN KEY (AllergyID) REFERENCES Allergies(AllergyID)
);

13.9.2.3.1. Benefits of 2NF#

Removes partial dependencies
Reduces data redundancy
Medication and allergy information stored once, referenced many times
Consistent data across all patient records
Easier maintenance of standard medical information

13.9.2.4. Third Normal Form (3NF)#

-- Previous tables remain, but add:

CREATE TABLE Addresses (
    AddressID INT PRIMARY KEY,
    Street TEXT,
    City TEXT,
    State TEXT,
    ZipCode TEXT,
    Country TEXT
);

-- Modify Patients table:
CREATE TABLE Patients (
    PatientID INT PRIMARY KEY,
    PatientName TEXT,
    DateOfBirth DATE,
    AddressID INT,
    Phone TEXT,
    PrimaryDoctorID INT,
    EmergencyContactID INT,
    InsuranceID INT,
    FOREIGN KEY (AddressID) REFERENCES Addresses(AddressID),
    FOREIGN KEY (PrimaryDoctorID) REFERENCES Doctors(DoctorID),
    FOREIGN KEY (EmergencyContactID) REFERENCES EmergencyContacts(ContactID),
    FOREIGN KEY (InsuranceID) REFERENCES Insurance(InsuranceID)
);

CREATE TABLE InsurancePlans (
    PlanID INT PRIMARY KEY,
    InsuranceID INT,
    PlanName TEXT,
    Coverage TEXT,
    FOREIGN KEY (InsuranceID) REFERENCES Insurance(InsuranceID)
);

13.9.2.4.1. Benefits of 3NF#

Eliminates transitive dependencies
Address information stored once and referenced
Insurance plan details separated from basic insurance information
Reduced redundancy in address storage
Easier to update address information for multiple patients
Facilitates address validation and standardization

13.9.2.4.2. More on Transitive Dependency#

Looking at the patient database example, we can identify the transitive dependencies that existed and how we resolved them in 3NF.

A clear example of transitive dependency in our patient database was:

PatientID → DoctorID → DoctorSpecialty
- A patient is assigned to a doctor (DoctorID)
- The doctor’s specialty depends on the DoctorID, not the PatientID
- In the non-normalized form, we stored specialty with each patient, creating redundancy

Here’s another example: 2. PatientID → AddressID → City/State/Zip

A patient has an address (AddressID)
The city, state, and zip depend on the address, not directly on the patient

Let’s see this in the actual tables:

Non-normalized version (with transitive dependency):

CREATE TABLE Patients (
    PatientID INTEGER PRIMARY KEY,
    PatientName TEXT,
    Address TEXT,          -- Contains full address including city/state/zip
    PrimaryDoctor TEXT,    -- Contains doctor name
    DoctorSpecialty TEXT,  -- This is transitively dependent on DoctorID!
    -- other fields...
)

3NF version (resolved transitive dependencies):

CREATE TABLE Patients (
    PatientID INTEGER PRIMARY KEY,
    PatientName TEXT,
    AddressID INTEGER,     -- Reference to Addresses table
    PrimaryDoctorID INTEGER,  -- Reference to Doctors table
    -- other fields...
)

CREATE TABLE Doctors (
    DoctorID INTEGER PRIMARY KEY,
    DoctorName TEXT,
    Specialty TEXT         -- Now directly dependent on DoctorID
)

CREATE TABLE Addresses (
    AddressID INTEGER PRIMARY KEY,
    Street TEXT,
    City TEXT,            -- Now directly dependent on AddressID
    State TEXT,           -- Now directly dependent on AddressID
    ZipCode TEXT          -- Now directly dependent on AddressID
)

Other transitive dependencies we resolved included: 3. PatientID → InsuranceID → PlanDetails

Plan details depend on the insurance ID, not the patient ID
Resolved by creating the InsurancePlans table

PatientID → MedicationID → StandardDosage
- Standard dosage depends on the medication, not the patient
- Resolved by separating standard medication information into the Medications table

The benefits of removing these dependencies in our patient database include:

If a doctor changes their specialty, we only update one record in the Doctors table
If a zip code changes for an address, we only update one record in the Addresses table
Standard medication information is stored once, not repeated for each prescription
Insurance plan details are maintained separately from individual patient policies

13.9.2.5. Summary of Benefits#

Non-Normalized to 1NF:
- Eliminates repeating groups
- Makes data atomic
- Establishes unique identifiers
- Enables basic data integrity
1NF to 2NF:
- Reduces redundancy in medical reference data
- Improves data consistency
- Makes updating standard medical information easier
- Enables better medication and allergy tracking
2NF to 3NF:
- Further reduces data redundancy
- Improves data integrity
- Makes address and insurance management more efficient
- Reduces storage requirements
- Simplifies data updates

Normalization

Contents

13.9. Normalization#

13.9.1. The Five Normal Forms#

13.9.2. Patient Database Normalization Example#

13.9.2.1. Non-Normalized Form#

13.9.2.1.1. Issues with Non-Normalized Form:#

13.9.2.2. First Normal Form (1NF)#

13.9.2.2.1. Benefits of 1NF:#

13.9.2.3. Second Normal Form (2NF)#

13.9.2.3.1. Benefits of 2NF#

13.9.2.4. Third Normal Form (3NF)#

13.9.2.4.1. Benefits of 3NF#

13.9.2.4.2. More on Transitive Dependency#

13.9.2.5. Summary of Benefits#