
SCHEMA REFINEMENT AND NORMAL FORMS
Chapter outline
 Normalization
 Functional Dependency
 Redundancy and update anomalies
 Decomposition
 Closure of a set of FDs
 Closure of a set of Attributes
 Normal forms
 Process of normalization
Normalization
 A technique for producing a set of relations with desirable properties, given the data requirements of an enterprise.
 2 main approaches for using normalization
 As a bottomup technique to create set of relations
 As a validation technique to check structure of relations
 Characteristics of a suitable set of relations :
 Minimal number of attributes necessary to support the data requirements of the enterprise
 Attributes with a close logical relationship (described as functional dependency) are found in the same relation
 Minimal redundancy with each attribute represented only once with the important exception of attributes that form a foreign key
 Four most commonly used normal forms are first (1NF), second (2NF) and third (3NF) normal forms, and Boyce–Codd normal form (BCNF)
 A relation can be normalized to a specific form to prevent possible occurrence of update anomalies
Functional Dependency (FD)
 Main concept associated with normalization.
 Functional Dependency
 Is a constraint on the set of legal relation
 Describes relationship between attributes in a relation
 If A and B are attributes of relation R, B is functionally dependent on A (denoted A > B), if each value of A in R is associated with exactly one value of B in R (A and B may each consist of one or more attributes)
 X > Y holds over relation R is, for every allowable instance r of R : if t1.X = t2.X, then t1.Y = t2.Y (Given 2 tuples in r, if the X values agree, then the Y values must also agree)
AB > C
 Main characteristics of functional dependencies used in normalization:
 have a 1:1 relationship between attribute(s) on left and righthand side of a dependency;
 hold for all time;
 are nontrivial
 Some functional dependencies are said to be trivial because they are satisfied by all relations
 For example, A > A is satisfied by all relations involving attribute A. Or AB > A
 In general, a functional dependency of the form a > b is trivial if b ? a
Redundancy and update anomalies
 Major aim of relational database design is to group attributes into relations to minimize data redundancy and reduce file storage space required by base relations
 Relations that contain redundant information may potentially suffer from update anomalies
 Types of update anomalies include:
 Insertion
 Deletion
 Modification
Decomposition
 Decomposition should be used with awareness :
 Is there reason to decompose a relation?
 What problems (if any) does the decomposition cause?
 Answer to the first question : Check normal forms
 Answer to the second question : Check 2 properties of decompositions : losslessjoin and dependencypreservation
 Must consider to performance as well. Some may stay with the problems of redundancy by adding some checks to application code (Some queries become more expensive if relation is decomposed)
Losslessjoin and Dependency Preservation Properties
 Two important properties of decomposition:
 Losslessjoin property enables us to find any instance of original relation from corresponding instances in the smaller relations
(instance change?)
 Dependency preservation property enables us to enforce a constraint on original relation by enforcing some constraint on each of the smaller relations
(constraints change?)
Properties of decompositions
Consider CSJDPQV, JP > C and SD > P
 BCNF decomposition : CSJDQV and SDP
 Problem : Checking JP > C requires a join!!
Closure of a set of FDs
 Database designer identify useful FDs that are semantically obvious. However, there are usually numerous other FDs
 Given a set F of FDs, we can prove that certain other FDs hold (FDs are logically implied by F)
 However, Finding closure set of FDs is impractical
Example
Given a relation schema
R(A, B, C, G, H, I)
and the set of FDs
A > B
A > C
CG > H
CG > I
B > H
The functional dependency A > H is logically implied
 A > B : if t1.A = t2.A, then t1.B = t2.B
 B > H : if t1.B = t2.B, then t1.H = t2.H
 Therefore, A > H : if t1.A = t2.A, then t1.H = t2.H
 Let F be a set of functional dependencies, the closure of F, denoted by F+, is the set of all functional dependencies logically implied by F
Closure of a set of FDs
 Set of inference rules, called Armstrong’s axioms, specifies how new functional dependencies can be inferred from given ones
 Armstrong’s axioms are sound, because they do not generate any incorrect functional dependencies. They are complete, because for a given set F of functional dependencies, they allow us to generate all F+
 Let A, B, and C be subsets of the attributes of relation R. Armstrong’s axioms are as follows:
1. Reflexivity
If B is a subset of A, then A > B
2. Augmentation
If A > B, then AC > BC
3. Transitivity
If A > B and B > C, then A > C
 Reflexivity and Selfdetermination state that a set of attributes always determines any of its subsets or itself. These rules generate FDs that are always true, such FDs are trivial and are generally not interesting or useful
Review some vocabularies
 Keys
 Superkey  K is a superkey for relation schema R if and only if K ? R
 Candidate Key  K is a candidate key for R if and only if
 K ? R (i.e., K is a super key), and
 for no a ? K, a ? R
 (i.e., K is minimal with respect to the number of attributes of which it is composed)
 Prime attributes  attributes that belong to any candidate key
 Primary Key – Pick one from the candidate keys
Example
Consider the universal relation
R(A, B, C, D, E)
And the set of functional dependencies
F = {A>BC, CD>E, B>D, E>A}
 Compute A+, B+
 Is B>C in the closure F+?
 Is CD>E in the closure F+?
 What are the candidate keys for R?
Normal forms
 Normal forms based on FDs are 1NF, 2NF, 3NF, and BCNF
 1NF : no repeating groups
 2NF : in 1NF and no partial dependencies (where an attribute is dependent on only a part of a primary key > composite key)
 3NF : in 2NF and no transitive dependencies ( where a nonkey attribute is dependent on another nonkey attribute)
 BCNF – in 3NF and every determinant is a candidate key.
Process of normalization
