"

10. NORMALIZATION

10.1.2: Anomalies

An anomaly is a variation that differs in some way from what is considered normal. With

regards to maintaining a database, we consider the actions that must occur when data is updated, inserted, or deleted. In database applications where these update, insert, and/or delete operations are common (e.g. OLTP databases), it is desirable for these operations to be as simple and efficient as possible.

When relations are not fully normalized, they exhibit update anomalies because basic operations are not as simple as possible. When relations are not fully normalized, some aspect of the relation will be awkward to maintain.

Consider the relation structure and sample data:

deptNum

courseNum

studNum

grade

studName

92

101

3344

A

Joe

92

115

7654

A

Brenda

81

101

7654

C

Brenda

92

226

3344

B

Joe

This relation is used for keeping track of student enrollments, the grade assigned, and (oddly) the student’s name.

What must happen if a student’s name were to change? We should want our databases to have correct information, and so the name may need to be changed in several records, not just one. This is an example of an update anomaly – the simple change of a student’s name affects, not just one record, but potentially several in the database. The update operation is more complex than necessary, and this means it is more expensive to do, resulting in slower performance. When operations become more complex than necessary, there is also a chance the operation is programmed incorrectly resulting in corrupted data — another unfortunate consequence.

Consider the Course and Department tables again, but now consider that they are combined into a single table. Obviously, this is a table with a considerable redundancy – for each course in the same department, the department location, phone, and chair must be repeated.

image

 

The primary key of such a table must be {deptCode, courseNo}. Consider for the following, however unlikely the situation seems, that the Deparment_Course table is the only table where department information is kept. Note that our point here is only to show, for a simple example, how redundancy leads to difficult semantics for database operations.

Insert anomaly

Suppose the university added a new department but there are no courses for that department yet. How can a row be added to the above table? Since no part of a primary key can be null, we cannot insert a row for a new department because we do not have a value for courseNo. This is an example of an insertion anomaly.

Delete anomaly

Suppose some department is undergoing a major reorganization. All courses are to be removed and later some new courses will be added. If we delete all courses, then we lose all the information in the database for that department.

The previous discussion concerning anomalies highlights some of the data management issues that arise when a relation is not fully normalized. Another way of describing the general problem here, as far as updating a database is concerned, is that redundant data makes it more complicated for us to keep the data consistent.