
RELATIONAL ALGEBRA
Chapter outline
 Preliminaries
 Relational Algebra
 Algebra operators
Preliminaries
 Query languages are specialized languages for asking questions, or queries, that involve the data in a database
 Relational queries
 inputs and outputs of a query are relations
 a query is evaluated using instances of each input relation and it produces an instance of the output relation
 refer to fields by using field name or position
 Both relational algebra and calculus are formal, non user friendly languages. (used as the basis for other highlevel query languages)
Relational Algebra
 Relational algebra is a procedural query language. It consists of a set of operations that take one or more relations as input and produce a new relation as their result
 Each relational query describes a stepbystep procedure for computing the desired answer, based on the order in which operators are applied in the query
 Basic operations:
 Selection (s) Selects a subset of rows from relation.
 Projection (P) Deletes unwanted columns from relation.
 Crossproduct (X) Allows us to combine two relations.
 Setdifference (–) Tuples in reln. 1, but not in reln. 2.
 Union (?) Tuples in reln. 1 and in reln. 2.
 Additional operations:
 Intersection, join, division, renaming: Not essential, but (very!) useful.
Instance of example relations
Instance S1 of Sailors
sid 
sname 
rating 
age 
22 
Dustin 
7 
45.0 
31 
Lubber 
8 
55.5 
58 
Rusty 
10 
35.0 
Instance S2 of Sailors
sid 
sname 
rating 
age 
28 
Yuppy 
9 
35.0 
31 
Lubber 
8 
55.5 
44 
Guppy 
5 
35.0 
58 
Rusty 
10 
35.0 
Instance R1 of Reserves
sid 
bid 
Day 
22 
101 
10/10/96 
58 
103 
11/12/96 
Algebra operators
1. Selection and projection (Unary operator)
 Operations that remove parts of a relation
1.1 Selection
spredicate (R)
 The selection operation works on a single relation R and defines a relation that contains only those tuples of R that satisfy the specified condition (predicate)
ExampleSelection (or Restriction)
 Schema of result identical to schema of (only) input relation.
 Result relation can be the input for another relational algebra operation!
1.2 Projection
Pcol1, . . . , coln(R)
 The projection operation works on a single relation R and defines a relation that contains a vertical subset of R, extracting the values of specified attributes and eliminating duplicates.
2. Set operations
2.1 Union (R ? S)
 The union of two relations R and S with I and J tuples, respectively, is obtained by concatenating them into one relation with a maximum of (I+J) tuples, duplicate tuples being eliminated. R and S must be unioncompatible
Unioncompatible
 relations have the same number of the fields, and
 corresponding fields, taken in order from left to right, have the same domain
2.2 Intersection (R ? S)
 The intersection operation consists of the set of all tuples that are in both R and S. R and S must be unioncompatible. The schema of the result is defined to be identical to the schema of R
2.3 Set difference (R – S)
 The set difference operation defines a relation consisting of the tuples that are in relation R, but not in S. R and S must be unioncompatible
2.4 Crossproduct (Cartesian product)
R X S
 The crossproduct operation returns a relation instance whose schema contains all the fields of R (in the same order as they appear in R) followed by all the fields of S (in the same order as they appear in S)
The result relation has
 no. of fields = no. of fields in R + no. of fields in S
 no. of rows = no. of rows in R * no. of rows in S
3. Renaming
 The renaming operation does not affect the tuples of a relation, but changes the relation schema, ie., the name of the attributes and/or the name of relation itself
r(R(F), E)
4. Joins
 A join operation can be defined as a crossproduct followed by selections and projections
4.1 Condition joins (theta joins) R FS
 The condition join operation defines a relation that contains tuple satisfying the predicate c (condition) from the crossproduct of R and S
 The predicate F is of the form R.ai q S.bi where q may be one of the comparison operators (<, ?, >, ?, =, ?)
Theta join (qjoin)
 Can rewrite Theta join using basic Selection and Cartesian product operations.
R FS = sF(R C S)
 Degree of a Theta join is sum of degrees of the operand relations R and S. If predicate F contains only equality (=), the term Equijoin is used.
 S1 S1.sid < R1.sid R1
4.2 Equijoin
 Equijoin operation is a common special case of the join operation that the join condition consists only of equalities (doing an additional projection)
 The schema of the result of an equijoin contains the fields of R (with the same names and domains as in R) followed by the fields of S that do not appear in the join conditions
Example – Equijoin
 S1 S1.sid = R1.sid R1 Or S1 sid R1
4.3 Natural join
 A further special case of the join operations R S is an equijoin in which equalities are specified on all fields having the same name in R and S. In this case, we can simply omit the join condition; the default is that the join condition is a collection of equalities on all common fields
4.4 Outer join
 In joining two relations, a tuple in one relation does not have a matching tuple in the other relation. We may want a row from one of the relations to appear in the result even when there is no matching value in the other relation. Missing values are set to null
(left outer join, right outer join, full outer join)
4.5 Semijoin
 R F S
 Defines a relation that contains the tuples of R that participate in the join of R with S
Can rewrite Semijoin using Projection and Join:
R F S = PA(R F S)
A is the set of all attributes in R
5. Division (R/S)
 The division operation consists of the set of tuple from R defined over the attributes C that match the combination of every tuple in S
 Expressed using basic operations:
T1 ? PC(R)
T2 ? PC((S X T1) – R)
T ? T1 – T2
Summary
 The relational model has rigorously defined query languages that are simple and powerful.
 Relational algebra is more operational; useful as internal representation for query evaluation plans.
 Several ways of expressing a given query; a query optimizer should choose the most efficient version.
