Learn Database Online Tutorial

RELATIONAL ALGEBRA
Chapter outline

  • Preliminaries
  • Relational Algebra
  • Algebra operators

Preliminaries

  • Query languages are specialized languages for asking questions, or queries, that involve the data in a database
  • Relational queries

                - inputs and outputs of a query are relations
- a query is evaluated using instances of each input relation and it produces an instance of the output relation
- refer to fields by using field name or position

  • Both relational algebra and calculus are formal, non user friendly languages. (used as the basis for other high-level query languages)

Relational Algebra

  • Relational algebra is a procedural query language. It consists of a set of operations that take one or more relations as input and produce a new relation as their result
  • Each relational query describes a step-by-step procedure for computing the desired answer, based on the order in which operators are applied in the query
  • Basic operations:
    • Selection  (s)    Selects a subset of rows from relation.
    • Projection  (P)   Deletes unwanted columns from relation.
    • Cross-product  (X)  Allows us to combine two relations.
    • Set-difference  (–)  Tuples in reln. 1, but not in reln. 2.
    • Union  (?)  Tuples in reln. 1 and in reln. 2.
  • Additional operations:
    • Intersection, join, division, renaming:  Not essential, but (very!) useful.

Instance of example relations
Instance S1 of Sailors


sid

sname

rating

age

22

Dustin

7

45.0

31

Lubber

8

55.5

58

Rusty

10

35.0

Instance S2 of Sailors


sid

sname

rating

age

28

Yuppy

9

35.0

31

Lubber

8

55.5

44

Guppy

5

35.0

58

Rusty

10

35.0

Instance R1 of Reserves


sid

bid

Day

22

101

10/10/96

58

103

11/12/96

Algebra operators
1. Selection and projection (Unary operator)
- Operations that remove parts of a relation
1.1 Selection
spredicate (R)
- The selection operation works on a single relation R and defines a relation that contains only those tuples of R that satisfy the specified condition (predicate)
databaseExample-Selection (or Restriction)

database 

  • Schema of result identical to schema of (only) input relation.
  • Result relation can be the input for another relational algebra operation!

1.2 Projection
Pcol1, . . . , coln(R)
- The projection operation works on a single relation R and defines a relation that contains a vertical subset of R, extracting the values of specified attributes and eliminating duplicates.
2. Set operations
2.1 Union                             (R ? S)
- The union of two relations R and S with I and J tuples, respectively, is obtained by concatenating them into one relation with a maximum of (I+J) tuples, duplicate tuples being eliminated. R and S must be union-compatible
Union-compatible
- relations have the same number of the fields, and
- corresponding fields, taken in order from left to right, have the same domain
2.2 Intersection                                (R ? S)
- The intersection operation consists of the set of all tuples that are in both R and S. R and S must be union-compatible. The schema of the result is defined to be identical to the schema of R
2.3 Set difference            (R – S)
- The set difference operation defines a relation consisting of the tuples that are in relation R, but not in S. R and S must be union-compatible
2.4 Cross-product (Cartesian product)
R X S
- The cross-product operation returns a relation instance whose schema contains all the fields of R (in the same order as they appear in R) followed by all the fields of S (in the same order as they appear in S)
The result relation has
- no. of fields = no. of fields in R + no. of fields in S
- no. of rows = no. of rows in R * no. of rows in S
3. Renaming
- The renaming operation does not affect the tuples of a relation, but changes the relation schema, ie., the name of the attributes and/or the name of relation itself
r(R(F), E)
database 


4. Joins
- A join operation can be defined as a cross-product followed by selections and projections
4.1 Condition joins (theta joins) R     FS
- The condition join operation defines a relation that contains tuple satisfying the predicate c (condition) from the cross-product of R and S
- The predicate F is of the form R.ai q S.bi where q may be one of the comparison operators (<, ?, >, ?, =, ?)
Theta join (q-join)

  • Can rewrite Theta join using basic Selection and Cartesian product operations.

R      FS = sF(R C S)

  • Degree of a Theta join is sum of degrees of the operand relations R and S. If predicate F contains only equality (=), the term Equijoin is used.
  • databaseS1     S1.sid < R1.sid R1

 

4.2 Equijoin
- Equijoin operation is a common special case of the join operation that the join condition consists only of equalities (doing an additional projection)
- The schema of the result of an equijoin contains the fields of R (with the same names and domains as in R) followed by the fields of S that do not appear in the join conditions
Example – Equijoin        

  • S1     S1.sid = R1.sid R1 Or S1     sid R1

database 

4.3  Natural join
- A further special case of the join operations    R    S is an equijoin in which equalities are specified on all fields having the same name in R and S. In this case, we can simply omit the join condition; the default is that the join condition is a collection of equalities on all common fields
4.4  Outer join
- In joining two relations, a tuple in one relation does not have a matching tuple in the other relation. We may want a row from one of the relations to appear in the result even when there is no matching value in the other relation. Missing values are set to null
(left outer join, right outer join, full outer join)
4.5  Semijoin

  • R     F S   
    • Defines a relation that contains the tuples of R that participate in the join of R with S

Can rewrite Semijoin using Projection and Join:
R    F S = PA(R      F S)
A is the set of all attributes in R
5. Division                            (R/S)
- The division operation consists of the set of tuple from R defined over the attributes C that match the combination of every tuple in S

  • Expressed using basic operations:

                T1 ? PC(R)
T2 ? PC((S X T1) – R)
T ? T1 – T2


Summary

  • The relational model has rigorously defined query languages that are simple and powerful.
  • Relational algebra is more operational; useful as internal representation for query evaluation plans.
  • Several ways of expressing a given query; a query optimizer should choose the most efficient version.