Re: Newbie question about db normalization theory: redundant keys OK?

From: -CELKO- <jcelko212_at_earthlink.net>
Date: Thu, 13 Dec 2007 06:12:11 -0800 (PST)
Message-ID: <e1bf13f3-76a4-4596-91a7-7ef5614a6ab3_at_i12g2000prf.googlegroups.com>


CREATE TABLE Schedule
(teacher VARCHAR(15) NOT NULL,
 class CHAR(15) NOT NULL,
 room INTEGER NOT NULL,
 period INTEGER NOT NULL,
 PRIMARY KEY (teacher, class, room, period));

That choice of a primary key is the most obvious one -- use all the columns. Typical rows would look like this:

('Mr. Celko', 'Database 101', 222, 6)

The rules we want to enforce are:

  1. A teacher is in only one room each period.
  2. A teacher teaches only one class each period.
  3. A room has only one class each period.
  4. A room has only one teacher in it each period.

Stop reading and see what you come up with for an answer. Okay, now consider using one constraint for each rule in the list, thus.

CREATE TABLE Schedule_1 -- version one, wrong! (teacher VARCHAR(15) NOT NULL,
 class CHAR(15) NOT NULL,
 room INTEGER NOT NULL,
 period INTEGER NOT NULL,

 UNIQUE (teacher, room, period), -- rule #1
 UNIQUE (teacher, class, period), -- rule #2
 UNIQUE (class, room, period),  -- rule #3
 UNIQUE (teacher, room, period), -- rule #4
 PRIMARY KEY (teacher, class, room, period));

We know that there are 4 ways to pick three things from a set of four things permutation. While column order is important in creating an index, we can ignore it for now and then worry about index tuning later in the book.

I could drop the PRIMARY KEY as redundant if I have all four of these constraints in place. But what happens if I drop the PRIMARY KEY and then one of the constraints?

CREATE TABLE Schedule_2 -- still wrong
(teacher VARCHAR(15) NOT NULL,
 class CHAR(15) NOT NULL,
 room INTEGER NOT NULL,
 period INTEGER NOT NULL,

 UNIQUE (teacher, room, period), -- rule #1
 UNIQUE (teacher, class, period), -- rule #2
 UNIQUE (class, room, period));    -- rule #3

I can now insert these rows in the second version of the table:

('Mr. Celko', 'Database 101', 222, 6)
('Mr. Celko', 'Database 102', 223, 6)

This gives me a very tough sixth period class load since I have to be in two different rooms at the same time. Things can get even worse when another teacher is added to the schedule:

('Mr. Celko', 'Database 101', 222, 6)
('Mr. Celko', 'Database 102', 223, 6)
('Ms. Shields', 'Database 101', 223, 6)

Ms. Shields and I are both in room 223, trying to teach different classes at the same time. Matthew Burr looked at the constraints and the rules, came up with this analysis.

CREATE TABLE Schedule_3 -- corrected version (teacher VARCHAR(15) NOT NULL,
 class CHAR(15) NOT NULL,
 room INTEGER NOT NULL,
 period INTEGER NOT NULL,
 UNIQUE (teacher, period), -- rules #1 and #2  UNIQUE (room, period)); -- rules #3 and #4

If a teacher is in only one room each period, then given a period and a teacher I should be able to determine only one room, i.e. room is functionally dependent upon the combination of teacher and period. Likewise, if a teacher teaches only one class each period, then class is functionally dependent upon the combination of teacher and period. The same thinking holds for the last two rules: class is functionally dependent upon the combination of room and period, and teacher is functionally dependent upon the combination of room and period.

With the constraints that were provided in the first version, you will find that the rules are not enforced. For example, I could enter the following rows:

('Mr. Celko', 'Database 101', 222, 6)
('Mr. Celko', 'Database 102', 223, 6)

These rows violate two of your rules, rule #1 and rule #2. However, the unique constraints first provided in Schedule_2 do not capture this violation and will allow the rows to be entered.

The constraint

  UNIQUE (teacher, room, period)

is checking the complete combination of teacher, room, and period, and since ('Mr. Celko', 222, 6) is different from ('Mr. Celko', 223, 6), the DDL does not find any problem with both rows being entered, even though that means that Mr. Celko is in more than one room during the same period.

  UNIQUE (teacher, class, period)

doesn't catch its associated rule either since ('Mr. Celko', 'Database 101', 6) is different from ('Mr. Celko', 'Database 102', 6), and so, Mr. Celko is able to teach more than one class during the same period, thus violating rule two. It seems that we'd also be able to add the following row:

('Ms. Shields', 'Database 103', 222, 6)

which violates rules #3 and #4. Received on Thu Dec 13 2007 - 15:12:11 CET

Original text of this message