10. NORMALIZATION
10.3: Summary
We have discussed functional dependencies, candidate keys, 1NF and BCNF. BCNF is the usual objective of the database designer.
When a relation is not BCNF then one or more of the following will be the source of redundancy in a relation:


 partial dependencies
 transitive dependencies
 functional dependencies amongst key attributes.

2NF involves the concepts of candidate key and nonkey attributes. A relation is considered to be in 2NF if it is in 1NF, and every nonkey attribute is fully dependent on each candidate key. In Example 2 we mentioned that stuNum –> birthdate was considered a partial functional dependency as stuNum is a subset of a candidate key. A 2NF relation does not contain partial dependencies.
3NF involves the concepts of candidate key and nonkey attributes. We say a relation is in 3NF if the relation is in 1NF and all determinants of nonkey attributes are candidate keys. In Example 3 we mentioned that courseId –> lastName was considered a transitive dependency; lastName is dependent on teacherId which is not a candidate key. A 3NF relation does not have partial dependencies nor transitive dependencies.
The definition of BCNF concerns FDs and CKs – there is no mention of nonkey attributes. Hence, BCNF is a stronger form than 2NF or 3NF (a BCNF relation will be in 2NF and 3NF).
A database designer may decide to not normalize completely to BCNF. This is sometimes done to ensure that certain data can be retrieved without having to join relations in a query – when a join is avoided the data is typically retrieved more quickly from the database. This is often done in a data warehouse environment (outside the scope of these notes).
Exercises
In each of these exercises, consider the relation, CKs, and FDs. Determine if the relation is in BCNF, and if not in BCNF give a nonloss decomposition into BCNF relations. The last 5 questions are abstract and give no context for the relation nor attributes.
1) Consider a relation Player which has information about players for some sports league. Player has attributes id, first, last, gender – id is the only CK and the FDs are:
 id –> first
 id –> last
 id –> gender
Player – sample data:
id 
first 
last 
gender 
1 
Jim 
Jones 
Male 
2 
Betty 
Smith 
Female 
3 
Jim 
Smith 
Male 
4 
Lee 
Mann 
Male 
5 
Sarah 
McDonald 
Female 
2) Consider a relation Employee which has information about employees in some company. Employee has attributes id, first, last, sin (social insurance number) where id and sin are the only CKs, and the FDs are:
 id –> first
 id –> last
 sin –> first
 sin –> last
 id –> sin
 sin –> id
Employee – sample data:
id 
first 
last 
sin 
1 
Jim 
Jones 
111222333 
2 
Betty 
Smith 
333333333 
3 
Jim 
Smith 
456789012 
4 
Lee 
Mann 
123456789 
5 
Samantha 
McDonald 
987654321 
3) Consider a relation Player which contains information about players and their teams. Player has attributes playerId, first, last, gender, teamId, teamName, teamCity where playerId is the only CK and the FDs are:
 playerId –>first
 playerId –> last
 playerId –> gender
 playerId –> teamId
 playerId –> teamName
 playerId –> teamCity
 teamId –> teamName
 teamId –> teamCity
Player – sample data:
playerId 
first 
last 
gender 
teamId 
teamName 
team City 
1 
Jim 
Jones 
M 
1 
Flyers 
Wpg 
2 
Betty 
Smith 
F 
5 
OilKings 
Cgy 
3 
Jim 
Smith 
M 
10 
Oilers 
Edm 
4 
Lee 
Mann 
M 
1 
Flyers 
Wpg 
5 
Samantha 
McDonald 
F 
5 
OilKings 
Cgy 
6 
Jimmy 
Jasper 
M 
99 
OilKings 
Wpg 
4) Consider a relation Building which has information about buildings and floors. Building has attributes buildingCode, floor, numRooms, campus where {buildingCode,floor} is the only CK and the FDs are:
 {buildingCode,floor} –> numRooms
 buildingCode –> campus
Building – sample data:
buildingCode 
floor 
numRooms 
campus 
D 
3 
15 
Downtown 
C 
2 
5 
Downtown 
RP 
1 
20 
Selkirk 
D 
2 
5 
Downtown 
D 
1 
20 
Downtown 
5) Consider a relation Course which contains information about courses. Course has attributes deptCode, deptName, courseNum, creditHours where {deptCode,courseNum} and {deptName,courseNum} are the only CKs, and the FDs are:
 {deptCode,courseNum} –> creditHours
 {deptName,courseNum} –> creditHours deptCode deptName
 deptName –> deptCode
Course – sample data:
deptCode 
deptName 
courseNum 
creditHours 
Math 
Mathematics 
2101 
3 
Stat 
Statistics 
2102 
3 
Math 
Mathematics 
2102 
1 
Stat 
Statistics 
4001 
6 
Math 
Mathematics 
4001 
6 
6) Consider the relation Student Performance below which describes student performance in courses. The value stored in the gradePoint column is the grade point that corresponds to the grade received in a course. Assume that students are identified by their student number, and that courses are identified by their course id. Assume each student can take a course only once and so each row is uniquely identified by {stuNum, courseId}. Each student’s overall gpa is stored – gpa is the average of gradePoint for all courses taken by a student.
Student Performance – sample data:
stuNum 
courseId 
grade 
gradePoint 
gpa 
111 
3030 
C 
2.0 
2.0 
113 
3030 
C 
2.0 
2.5 
113 
4040 
B 
3.0 
2.5 
118 
2222 
C 
2.0 
2.25 
118 
4040 
C+ 
2.5 
2.25 
202 
1188 
B 
3.0 
3.0 
7) Consider Example 4. Is there another decomposition of ProvinceLanguageStatistics that leads to BCNF relations?
8) Consider a relation R with attributes X, Y, W, Z where X is the only CK, and where there are FDs:
 X –> Y X –> W X –> Z
9) Consider a relation R with attributes X, Y, W, V where X and V are the only CKs, and where there are FDs:
 X –> Y X –>W V –>Y V –> W X –> V V –> X
10) Consider a relation R with attributes X, Y, W, V, Z where X is the only CK, and where there are FDs:
 X –> Y X –> W W –> Z W –> V
11) Consider a relation R with attributes A, B, C, D, E, F where {A,B} is the only CK, and where there are FDs:
 {A,B} –> C
 {A,B} –> D
 A –> E A –> F
12) Consider a relation R with attributes A, B, C, D, E where {A,C} and {B,C} are the only CKs, and where there are FDs:
 {A,C} –> D
 {B,C} –> D
 {A,C} –> E
 {B,C} –> E
 A –> B B –> A