Learn Database Online Tutorial

SCHEMA REFINEMENT AND NORMAL FORMS
Chapter outline

  • Normalization
  • Functional Dependency
  • Redundancy and update anomalies
  • Decomposition
  • Closure of a set of FDs
  • Closure of a set of Attributes
  • Normal forms
  • Process of normalization

Normalization

  • A technique for producing a set of relations with desirable properties, given the data requirements of an enterprise.
  • 2 main approaches for using normalization
    • As a bottom-up technique to create set of relations
    • As a validation technique to check structure of relations
  • Characteristics of a suitable set of relations :
    • Minimal number of attributes necessary to support the data requirements of the enterprise
    • Attributes with a close logical relationship (described as functional dependency) are found in the same relation
    • Minimal redundancy with each attribute represented only once with the important exception of attributes that form a foreign key
  • Four most commonly used normal forms are first (1NF), second (2NF) and third (3NF) normal forms, and Boyce–Codd normal form (BCNF)
  • A relation can be normalized to a specific form to prevent possible occurrence of update anomalies

Functional Dependency (FD)

  • Main concept associated with normalization.
  • Functional Dependency
    • Is a constraint on the set of legal relation
    • Describes relationship between attributes in a relation
    • If A and B are attributes of relation R, B is functionally dependent on A (denoted A -> B), if each value of A in R is associated with exactly one value of B in R (A and B may each consist of one or more attributes)
  • X -> Y holds over relation R is, for every allowable instance r of R : if t1.X = t2.X, then t1.Y = t2.Y (Given 2 tuples in r, if the X values agree, then the Y values must also agree)

AB -> Ca

  • Main characteristics of functional dependencies used in normalization:
    • have a 1:1 relationship between attribute(s) on left and right-hand side of a dependency;
    • hold for all time;
    • are nontrivial
  • Some functional dependencies are said to be trivial because they are satisfied by all relations
  • For example, A -> A is satisfied by all relations involving attribute A.  Or AB -> A
  • In general, a functional dependency of the form a -> b is trivial if b ? a

Redundancy and update anomalies

  • Major aim of relational database design is to group attributes into relations to minimize data redundancy and reduce file storage space required by base relations
  • Relations that contain redundant information may potentially suffer from update anomalies
  • Types of update anomalies include:
    • Insertion
    • Deletion
    • Modification

Decomposition

  • Decomposition should be used with awareness :

                - Is there reason to decompose a relation?
- What problems (if any) does the decomposition cause?

  • Answer to the first question : Check normal forms
  • Answer to the second question : Check 2 properties of decompositions : lossless-join and dependency-preservation
  • Must consider to performance as well. Some may stay with the problems of redundancy by adding some checks to application code (Some queries become more expensive if relation is decomposed)

Lossless-join and Dependency Preservation Properties

  • Two important properties of decomposition:

                - Lossless-join property enables us to find any instance of original relation from corresponding instances in the smaller relations
(instance change?)
                - Dependency preservation property enables us to enforce a constraint on original relation by enforcing some constraint on each of the smaller relations
(constraints change?)
Properties of decompositions

  • Dependency preservation

                Consider CSJDPQV, JP -> C and SD -> P
- BCNF decomposition : CSJDQV and SDP
- Problem : Checking JP -> C requires a join!!
Closure of a set of FDs

  • Database designer identify useful FDs that are semantically obvious. However, there are usually numerous other FDs
  • Given a set F of FDs, we can prove that certain other FDs hold (FDs are logically implied by F)
  • However, Finding closure set of FDs is impractical

Example
Given a relation schema
R(A, B, C, G, H, I)
and the set of FDs
A -> B
A -> C
CG -> H
CG -> I
B -> H
The functional dependency A -> H is logically implied

  • A -> B : if t1.A = t2.A, then t1.B = t2.B
  • B -> H : if t1.B = t2.B, then t1.H = t2.H
  • Therefore, A -> H : if t1.A = t2.A, then t1.H = t2.H
  • Let F be a set of functional dependencies, the closure of F, denoted by F+, is the set of all functional dependencies logically implied by F

Closure of a set of FDs

  • Set of inference rules, called Armstrong’s axioms, specifies how new functional dependencies can be inferred from given ones
  • Armstrong’s axioms are sound, because they do not generate any incorrect functional dependencies. They are complete, because for a given set F of functional dependencies, they allow us to generate all F+
  • Let A, B, and C be subsets of the attributes of relation R. Armstrong’s axioms are as follows:

                 1. Reflexivity
If B is a subset of A, then A -> B
2. Augmentation
If A -> B, then AC -> BC
3. Transitivity
If A -> B and B -> C, then A -> C

  • Reflexivity and Self-determination state that a set of attributes always determines any of its subsets or itself. These rules generate FDs that are always true, such FDs are trivial and are generally not interesting or useful

Review some vocabularies

  • Keys
  •  Superkey - K is a superkey for relation schema R if and only if K ? R
  •  Candidate Key - K is a candidate key for R if and only if
    • K ? R (i.e., K is a super key), and
    • for no a ? K, a ? R
    • (i.e., K is minimal with respect to the number of attributes of which it is composed)
  •  Prime attributes - attributes that belong to any candidate key
  •  Primary Key – Pick one from the candidate keys

Example
Consider the universal relation
R(A, B, C, D, E)
And the set of functional dependencies
F = {A->BC, CD->E, B->D, E->A}

  1. Compute A+, B+
  2. Is B->C in the closure F+?
  3. Is CD->E in the closure F+?
  4. What are the candidate keys for R?

Normal forms

  • Normal forms based on FDs are 1NF, 2NF, 3NF, and BCNF
  • 1NF : no repeating groups
  • 2NF : in 1NF and no partial dependencies (where an attribute is dependent on only a part of a primary key -> composite key)
  • 3NF : in 2NF and no transitive dependencies ( where a non-key attribute is dependent on another non-key attribute)
  • BCNF – in 3NF and every determinant is a candidate key.

Process of normalization
a