Post

DBS101_Unit5

DBS101_Unit5

Unit 5: Normalization and Functional Dependencies

Introduction to Good Relational Designs

A good relational database design should have these key features:

  1. Relation for every entity
    • Each tuple represents one entity or relationship
    • Attributes of different entities should not be mixed
    • Foreign keys should reference other entities
    • Entity and relationship attributes should be separate
  2. Minimal NULL values
    • Relations should be designed to minimize NULL values
    • Attributes frequently NULL can be moved to separate relations
  3. No spurious tuples
    • Avoid designs that produce erroneous JOIN results
    • Ensure “lossless join” property for meaningful results
  4. No redundancy
    • Avoid storing the same data in multiple places
    • Redundancy creates problems:
      • Wasted storage space
      • Multiple data entry points
      • Multiple deletion points
      • Multiple modification points
  5. No modification anomalies
    • Update anomaly: Changing data in one place but not others creates inconsistency
    • Deletion anomaly: Deleting data about one entity accidentally removes data about another
    • Insertion anomaly: Cannot add certain data because other required data is unavailable

Anomalies Example

Normalization Theory

Normalization is a database design technique that:

  • Reduces data redundancy
  • Makes information retrieval easier
  • Converts database design into standard formats (normal forms)

The approach is to decompose relational schemas that are not in “good form” into appropriate normal forms with lossless decomposition.

Functional Dependencies

A functional dependency (FD) is a relationship between attributes where one attribute’s value determines another’s value.

Notation: X → Y

  • X is the determinant
  • Y is the dependent Anomalies Example

Closure of Functional Dependencies (F+)

  • F+ contains all functional dependencies logically implied by a set F
  • Armstrong’s axioms provide rules for reasoning about dependencies:
    • Reflexivity: If β ⊆ α, then α → β
    • Augmentation: If α → β holds, then γα → γβ holds
    • Transitivity: If α → β and β → γ hold, then α → γ holds

Closure of Attribute Sets

Example:

F = { A → B, B → C, A → D } A+ = { A } (start with attribute A) Apply FDs:

  1. A → B: Add B to A+, now A+ = { A, B }
  2. B → C: Add C to A+, now A+ = { A, B, C }
  3. A → D: Add D to A+, now A+ = { A, B, C, D } Therefore A+ = { A, B, C, D }

Canonical Cover

A canonical cover (Fc) is a simplified version of F with the same meaning but no redundancy.

An attribute is extraneous if we can remove it without changing the closure of the set of FDs.

Normal Forms

Anomalies Example

First Normal Form (1NF)

  • All attributes must have atomic (indivisible) values
  • No repeating groups or arrays

Second Normal Form (2NF)

  • Must be in 1NF
  • All non-key attributes fully depend on the primary key

Third Normal Form (3NF)

A relation R is in 3NF if for all FDs α → β:

  • α → β is trivial (β ⊆ α), or
  • α is a superkey, or
  • Each attribute in β-α is contained in a candidate key

Boyce-Codd Normal Form (BCNF)

A relation R is in BCNF if for all FDs α → β:

  • α → β is trivial (β ⊆ α), or
  • α is a superkey

BCNF is stricter than 3NF and eliminates more anomalies, but may not preserve all dependencies.

Decomposition Algorithms

BCNF Decomposition Algorithm

  1. Start with a single relation R
  2. Find a functional dependency α → β that violates BCNF
  3. Decompose R into:
    • (α ∪ β)
    • (R - (β - α))
  4. Repeat until all relations are in BCNF

3NF Decomposition Algorithm

  1. Find a canonical cover Fc for F
  2. Create a relation for each FD in Fc
  3. If no schema contains a candidate key, add one
  4. Remove redundant schemas

Multivalued Dependencies and Higher Normal Forms

Multivalued Dependencies

A multivalued dependency X →→ Y holds if X values are independent of combining Y values with other values.

Example: In inst(ID, dept_name, name, street, city), ID →→ street, city means an instructor’s address is independent of their department.

Fourth Normal Form (4NF)

A schema is in 4NF if whenever a non-trivial multivalued dependency X →→ Y holds, X is a superkey.

Higher Normal Forms

  • Fifth Normal Form (5NF/PJNF): Eliminates redundancies from join dependencies
  • Domain-Key Normal Form (DKNF): Based on even more general constraints

Modeling Temporal Data

Temporal data is valid only over specific time periods.

To model temporal data:

  1. Add start_date and end_date columns to relations
  2. Primary keys must include time period to prevent overlaps
  3. Functional dependencies only hold at specific points in time

Example:

course (course_id, title, dept_name, credits, start_date, end_date)

Foreign keys become more complex with temporal data, requiring validity across time periods.

SQL: 2011 added some temporal support with PERIOD FOR and temporal constraints.

Key Takeaways

  • Normalization improves database design by reducing redundancy and anomalies
  • Each normal form addresses specific types of dependencies and constraints
  • BCNF provides better normalization but may sacrifice dependency preservation
  • 3NF offers a balance between normalization and dependency preservation
  • Higher normal forms address more complex dependencies but are rarely used in practice
  • Temporal data modeling requires special consideration for time-varying attributes
This post is licensed under CC BY 4.0 by the author.