10. NORMALIZATION
10.3: Summary
We have discussed functional dependencies, candidate keys, 1NF and BCNF. BCNF is the usual objective of the database designer.
When a relation is not BCNF then one or more of the following will be the source of redundancy in a relation:
-
-
- partial dependencies
- transitive dependencies
- functional dependencies amongst key attributes.
-
2NF involves the concepts of candidate key and non-key attributes. A relation is considered to be in 2NF if it is in 1NF, and every non-key attribute is fully dependent on each candidate key. In Example 2 we mentioned that stuNum –> birthdate was considered a partial functional dependency as stuNum is a subset of a candidate key. A 2NF relation does not contain partial dependencies.
3NF involves the concepts of candidate key and non-key attributes. We say a relation is in 3NF if the relation is in 1NF and all determinants of non-key attributes are candidate keys. In Example 3 we mentioned that courseId –> lastName was considered a transitive dependency; lastName is dependent on teacherId which is not a candidate key. A 3NF relation does not have partial dependencies nor transitive dependencies.
The definition of BCNF concerns FDs and CKs – there is no mention of non-key attributes. Hence, BCNF is a stronger form than 2NF or 3NF (a BCNF relation will be in 2NF and 3NF).
A database designer may decide to not normalize completely to BCNF. This is sometimes done to ensure that certain data can be retrieved without having to join relations in a query – when a join is avoided the data is typically retrieved more quickly from the database. This is often done in a data warehouse environment (outside the scope of these notes).
Exercises
In each of these exercises, consider the relation, CKs, and FDs. Determine if the relation is in BCNF, and if not in BCNF give a non-loss decomposition into BCNF relations. The last 5 questions are abstract and give no context for the relation nor attributes.
1) Consider a relation Player which has information about players for some sports league. Player has attributes id, first, last, gender – id is the only CK and the FDs are:
- id –> first
- id –> last
- id –> gender
Player – sample data:
id |
first |
last |
gender |
1 |
Jim |
Jones |
Male |
2 |
Betty |
Smith |
Female |
3 |
Jim |
Smith |
Male |
4 |
Lee |
Mann |
Male |
5 |
Sarah |
McDonald |
Female |
2) Consider a relation Employee which has information about employees in some company. Employee has attributes id, first, last, sin (social insurance number) where id and sin are the only CKs, and the FDs are:
- id –> first
- id –> last
- sin –> first
- sin –> last
- id –> sin
- sin –> id
Employee – sample data:
id |
first |
last |
sin |
1 |
Jim |
Jones |
111222333 |
2 |
Betty |
Smith |
333333333 |
3 |
Jim |
Smith |
456789012 |
4 |
Lee |
Mann |
123456789 |
5 |
Samantha |
McDonald |
987654321 |
3) Consider a relation Player which contains information about players and their teams. Player has attributes playerId, first, last, gender, teamId, teamName, teamCity where playerId is the only CK and the FDs are:
- playerId –>first
- playerId –> last
- playerId –> gender
- playerId –> teamId
- playerId –> teamName
- playerId –> teamCity
- teamId –> teamName
- teamId –> teamCity
Player – sample data:
playerId |
first |
last |
gender |
teamId |
teamName |
team City |
1 |
Jim |
Jones |
M |
1 |
Flyers |
Wpg |
2 |
Betty |
Smith |
F |
5 |
OilKings |
Cgy |
3 |
Jim |
Smith |
M |
10 |
Oilers |
Edm |
4 |
Lee |
Mann |
M |
1 |
Flyers |
Wpg |
5 |
Samantha |
McDonald |
F |
5 |
OilKings |
Cgy |
6 |
Jimmy |
Jasper |
M |
99 |
OilKings |
Wpg |
4) Consider a relation Building which has information about buildings and floors. Building has attributes buildingCode, floor, numRooms, campus where {buildingCode,floor} is the only CK and the FDs are:
- {buildingCode,floor} –> numRooms
- buildingCode –> campus
Building – sample data:
buildingCode |
floor |
numRooms |
campus |
D |
3 |
15 |
Downtown |
C |
2 |
5 |
Downtown |
RP |
1 |
20 |
Selkirk |
D |
2 |
5 |
Downtown |
D |
1 |
20 |
Downtown |
5) Consider a relation Course which contains information about courses. Course has attributes deptCode, deptName, courseNum, creditHours where {deptCode,courseNum} and {deptName,courseNum} are the only CKs, and the FDs are:
- {deptCode,courseNum} –> creditHours
- {deptName,courseNum} –> creditHours deptCode deptName
- deptName –> deptCode
Course – sample data:
deptCode |
deptName |
courseNum |
creditHours |
Math |
Mathematics |
2101 |
3 |
Stat |
Statistics |
2102 |
3 |
Math |
Mathematics |
2102 |
1 |
Stat |
Statistics |
4001 |
6 |
Math |
Mathematics |
4001 |
6 |
6) Consider the relation Student Performance below which describes student performance in courses. The value stored in the gradePoint column is the grade point that corresponds to the grade received in a course. Assume that students are identified by their student number, and that courses are identified by their course id. Assume each student can take a course only once and so each row is uniquely identified by {stuNum, courseId}. Each student’s overall gpa is stored – gpa is the average of gradePoint for all courses taken by a student.
Student Performance – sample data:
stuNum |
courseId |
grade |
gradePoint |
gpa |
111 |
3030 |
C |
2.0 |
2.0 |
113 |
3030 |
C |
2.0 |
2.5 |
113 |
4040 |
B |
3.0 |
2.5 |
118 |
2222 |
C |
2.0 |
2.25 |
118 |
4040 |
C+ |
2.5 |
2.25 |
202 |
1188 |
B |
3.0 |
3.0 |
7) Consider Example 4. Is there another decomposition of ProvinceLanguageStatistics that leads to BCNF relations?
8) Consider a relation R with attributes X, Y, W, Z where X is the only CK, and where there are FDs:
- X –> Y X –> W X –> Z
9) Consider a relation R with attributes X, Y, W, V where X and V are the only CKs, and where there are FDs:
- X –> Y X –>W V –>Y V –> W X –> V V –> X
10) Consider a relation R with attributes X, Y, W, V, Z where X is the only CK, and where there are FDs:
- X –> Y X –> W W –> Z W –> V
11) Consider a relation R with attributes A, B, C, D, E, F where {A,B} is the only CK, and where there are FDs:
- {A,B} –> C
- {A,B} –> D
- A –> E A –> F
12) Consider a relation R with attributes A, B, C, D, E where {A,C} and {B,C} are the only CKs, and where there are FDs:
- {A,C} –> D
- {B,C} –> D
- {A,C} –> E
- {B,C} –> E
- A –> B B –> A