"

10. NORMALIZATION

10.3: Summary

We have discussed functional dependencies, candidate keys, 1NF and BCNF. BCNF is the usual objective of the database designer.

When a relation is not BCNF then one or more of the following will be the source of redundancy in a relation:

      • partial dependencies
      • transitive dependencies
      • functional dependencies amongst key attributes.

2NF involves the concepts of candidate key and non-key attributes. A relation is considered to be in 2NF if it is in 1NF, and every non-key attribute is fully dependent on each candidate key. In Example 2 we mentioned that stuNum –> birthdate was considered a partial functional dependency as stuNum is a subset of a candidate key. A 2NF relation does not contain partial dependencies.

3NF involves the concepts of candidate key and non-key attributes. We say a relation is in 3NF if the relation is in 1NF and all determinants of non-key attributes are candidate keys. In Example 3 we mentioned that courseId –> lastName was considered a transitive dependency; lastName is dependent on teacherId which is not a candidate key. A 3NF relation does not have partial dependencies nor transitive dependencies.

The definition of BCNF concerns FDs and CKs – there is no mention of non-key attributes. Hence, BCNF is a stronger form than 2NF or 3NF (a BCNF relation will be in 2NF and 3NF).

A database designer may decide to not normalize completely to BCNF. This is sometimes done to ensure that certain data can be retrieved without having to join relations in a query – when a join is avoided the data is typically retrieved more quickly from the database. This is often done in a data warehouse environment (outside the scope of these notes).

Exercises

In each of these exercises, consider the relation, CKs, and FDs. Determine if the relation is in BCNF, and if not in BCNF give a non-loss decomposition into BCNF relations. The last 5 questions are abstract and give no context for the relation nor attributes.

1) Consider a relation Player which has information about players for some sports league. Player has attributes id, first, last, gender – id is the only CK and the FDs are:

  • id –> first
  • id –> last
  • id –> gender

Player sample data:

id

first

last

gender

1

Jim

Jones

Male

2

Betty

Smith

Female

3

Jim

Smith

Male

4

Lee

Mann

Male

5

Sarah

McDonald

Female

 

2) Consider a relation Employee which has information about employees in some company. Employee has attributes id, first, last, sin (social insurance number) where id and sin are the only CKs, and the FDs are:

  • id –> first
  • id –> last
  • sin –> first
  • sin –> last
  • id –> sin
  • sin –> id

 

Employee sample data:

id

first

last

sin

1

Jim

Jones

111222333

2

Betty

Smith

333333333

3

Jim

Smith

456789012

4

Lee

Mann

123456789

5

Samantha

McDonald

987654321

 

3) Consider a relation Player which contains information about players and their teams. Player has attributes playerId, first, last, gender, teamId, teamName, teamCity where playerId is the only CK and the FDs are:

  • playerId –>first
  • playerId –> last
  • playerId –> gender
  • playerId –> teamId
  • playerId –> teamName
  • playerId –> teamCity
  • teamId –> teamName
  • teamId –> teamCity

 

Player sample data:

playerId

first

last

gender

teamId

teamName

team   City

1

Jim

Jones

M

1

Flyers

Wpg

2

Betty

Smith

F

5

OilKings

Cgy

3

Jim

Smith

M

10

Oilers

Edm

4

Lee

Mann

M

1

Flyers

Wpg

5

Samantha

McDonald

F

5

OilKings

Cgy

6

Jimmy

Jasper

M

99

OilKings

Wpg

 

4) Consider a relation Building which has information about buildings and floors. Building has attributes buildingCode, floor, numRooms, campus where {buildingCode,floor} is the only CK and the FDs are:

  • {buildingCode,floor} –> numRooms
  • buildingCode –> campus

 

Building – sample data:

buildingCode

floor

numRooms

campus

D

3

15

Downtown

C

2

5

Downtown

RP

1

20

Selkirk

D

2

5

Downtown

D

1

20

Downtown

 

5) Consider a relation Course which contains information about courses. Course has attributes deptCode, deptName, courseNum, creditHours where {deptCode,courseNum} and {deptName,courseNum} are the only CKs, and the FDs are:

  • {deptCode,courseNum} –> creditHours
  • {deptName,courseNum} –> creditHours deptCode  deptName
  • deptName –> deptCode

 

Course sample data:

deptCode

deptName

courseNum

creditHours

Math

Mathematics

2101

3

Stat

Statistics

2102

3

Math

Mathematics

2102

1

Stat

Statistics

4001

6

Math

Mathematics

4001

6

 

6) Consider the relation Student Performance below which describes student performance in courses. The value stored in the gradePoint column is the grade point that corresponds to the grade received in a course. Assume that students are identified by their student number, and that courses are identified by their course id. Assume each student can take a course only once and so each row is uniquely identified by {stuNum, courseId}. Each student’s overall gpa is stored – gpa is the average of gradePoint for all courses taken by a student.

 

Student Performance sample data:

stuNum

courseId

grade

gradePoint

gpa

111

3030

C

2.0

2.0

113

3030

C

2.0

2.5

113

4040

B

3.0

2.5

118

2222

C

2.0

2.25

118

4040

C+

2.5

2.25

202

1188

B

3.0

3.0

 

7) Consider Example 4. Is there another decomposition of ProvinceLanguageStatistics that leads to BCNF relations?

 

8) Consider a relation R with attributes X, Y, W, Z where X is the only CK, and where there are FDs:

  • X –> Y X –> W X –> Z

 

9) Consider a relation R with attributes X, Y, W, V where X and V are the only CKs, and where there are FDs:

  • X –> Y X –>W V –>Y V –> W X –> V V –> X

 

10) Consider a relation R with attributes X, Y, W, V, Z where X is the only CK, and where there are FDs:

  • X –> Y X –> W W –> Z W –> V

 

11) Consider a relation R with attributes A, B, C, D, E, F where {A,B} is the only CK, and where there are FDs:

  • {A,B} –> C
  • {A,B} –> D
  • A –> E A –> F

 

12) Consider a relation R with attributes A, B, C, D, E where {A,C} and {B,C} are the only CKs, and where there are FDs:

  • {A,C} –> D
  • {B,C} –> D
  • {A,C} –> E
  • {B,C} –> E
  • A –> B B –> A