Learn Database Online Tutorial

SCHEMA REFINEMENT AND NORMAL FORMS
Chapter outline

Normalization
Functional Dependency
Redundancy and update anomalies
Decomposition
Closure of a set of FDs
Closure of a set of Attributes
Normal forms
Process of normalization

Normalization

A technique for producing a set of relations with desirable properties, given the data requirements of an enterprise.
2 main approaches for using normalization

As a bottom-up technique to create set of relations
As a validation technique to check structure of relations

Characteristics of a suitable set of relations :

Minimal number of attributes necessary to support the data requirements of the enterprise
Attributes with a close logical relationship (described as functional dependency) are found in the same relation
Minimal redundancy with each attribute represented only once with the important exception of attributes that form a foreign key

Four most commonly used normal forms are first (1NF), second (2NF) and third (3NF) normal forms, and Boyce–Codd normal form (BCNF)
A relation can be normalized to a specific form to prevent possible occurrence of update anomalies

Functional Dependency (FD)

Main concept associated with normalization.
Functional Dependency

Is a constraint on the set of legal relation
Describes relationship between attributes in a relation
If A and B are attributes of relation R, B is functionally dependent on A (denoted A -> B), if each value of A in R is associated with exactly one value of B in R (A and B may each consist of one or more attributes)

X -> Y holds over relation R is, for every allowable instance r of R : if t1.X = t2.X, then t1.Y = t2.Y (Given 2 tuples in r, if the X values agree, then the Y values must also agree)

AB -> C

Main characteristics of functional dependencies used in normalization:

have a 1:1 relationship between attribute(s) on left and right-hand side of a dependency;
hold for all time;
are nontrivial

Some functional dependencies are said to be trivial because they are satisfied by all relations
For example, A -> A is satisfied by all relations involving attribute A. Or AB -> A
In general, a functional dependency of the form a -> b is trivial if b ? a

Redundancy and update anomalies

Major aim of relational database design is to group attributes into relations to minimize data redundancy and reduce file storage space required by base relations
Relations that contain redundant information may potentially suffer from update anomalies
Types of update anomalies include:

Insertion
Deletion
Modification

Decomposition

Decomposition should be used with awareness :

- Is there reason to decompose a relation?
- What problems (if any) does the decomposition cause?

Answer to the first question : Check normal forms
Answer to the second question : Check 2 properties of decompositions : lossless-join and dependency-preservation
Must consider to performance as well. Some may stay with the problems of redundancy by adding some checks to application code (Some queries become more expensive if relation is decomposed)

Lossless-join and Dependency Preservation Properties

Two important properties of decomposition:

- Lossless-join property enables us to find any instance of original relation from corresponding instances in the smaller relations
(instance change?)
- Dependency preservation property enables us to enforce a constraint on original relation by enforcing some constraint on each of the smaller relations
(constraints change?)
Properties of decompositions

Dependency preservation

Consider CSJDPQV, JP -> C and SD -> P
- BCNF decomposition : CSJDQV and SDP
- Problem : Checking JP -> C requires a join!!
Closure of a set of FDs

Database designer identify useful FDs that are semantically obvious. However, there are usually numerous other FDs
Given a set F of FDs, we can prove that certain other FDs hold (FDs are logically implied by F)
However, Finding closure set of FDs is impractical

Example
Given a relation schema
R(A, B, C, G, H, I)
and the set of FDs
A -> B
A -> C
CG -> H
CG -> I
B -> H
The functional dependency A -> H is logically implied

A -> B : if t1.A = t2.A, then t1.B = t2.B
B -> H : if t1.B = t2.B, then t1.H = t2.H
Therefore, A -> H : if t1.A = t2.A, then t1.H = t2.H
Let F be a set of functional dependencies, the closure of F, denoted by F+, is the set of all functional dependencies logically implied by F

Closure of a set of FDs

Set of inference rules, called Armstrong’s axioms, specifies how new functional dependencies can be inferred from given ones
Armstrong’s axioms are sound, because they do not generate any incorrect functional dependencies. They are complete, because for a given set F of functional dependencies, they allow us to generate all F+
Let A, B, and C be subsets of the attributes of relation R. Armstrong’s axioms are as follows:

1. Reflexivity
If B is a subset of A, then A -> B
2. Augmentation
If A -> B, then AC -> BC
3. Transitivity
If A -> B and B -> C, then A -> C

Reflexivity and Self-determination state that a set of attributes always determines any of its subsets or itself. These rules generate FDs that are always true, such FDs are trivial and are generally not interesting or useful

Review some vocabularies

Keys
Superkey - K is a superkey for relation schema R if and only if K ? R
Candidate Key - K is a candidate key for R if and only if

K ? R (i.e., K is a super key), and
for no a ? K, a ? R
(i.e., K is minimal with respect to the number of attributes of which it is composed)

Prime attributes - attributes that belong to any candidate key
Primary Key – Pick one from the candidate keys

Example
Consider the universal relation
R(A, B, C, D, E)
And the set of functional dependencies
F = {A->BC, CD->E, B->D, E->A}

Compute A+, B+
Is B->C in the closure F+?
Is CD->E in the closure F+?
What are the candidate keys for R?

Normal forms

Normal forms based on FDs are 1NF, 2NF, 3NF, and BCNF
1NF : no repeating groups
2NF : in 1NF and no partial dependencies (where an attribute is dependent on only a part of a primary key -> composite key)
3NF : in 2NF and no transitive dependencies ( where a non-key attribute is dependent on another non-key attribute)
BCNF – in 3NF and every determinant is a candidate key.

Process of normalization