Re: expressing one-or-more constraint structurally

From: --CELKO-- <jcelko212_at_earthlink.net>
Date: 26 Jun 2004 21:14:31 -0700
Message-ID: <18c7b3c2.0406262014.6895664d_at_posting.google.com>


>> .. a cardinality constraint on one item vs. another, being able to do it structurally is the best approach. <<

Here is an old "cut & paste" of mine that might help:


One of the basic tricks in SQL is representing a one-to-one or many-to-many relationship with a table which references the two (or more) entity tables involved by their primary keys. This third table has several popular names such as "junction table" or "join table", but we know that it is a relationship. This type of table needs to have constraints on to assure that the relationships work properly.

For example given two tables,

CREATE TABLE Boys
(boy_name VARCHAR(30) NOT NULL PRIMARY KEY
 ...);

CREATE TABLE Girls
(girl_name VARCHAR(30) NOT NULL PRIMARY KEY,
 ... );

Yes, I know using names for a key is a bad practice, but it will make my examples easier to read. There are a lot of different relationships that we can make between these two tables. If you don't believe me, just watch the Jerry Springer show sometime. The simplest relationship table looks like this:

CREATE TABLE Pairs
(boy_name INTEGER NOT NULL

        REFERENCES Boys (boy_name)
        ON UPDATE CASCADE
        ON DELETE CASCADE, 
 girl_name INTEGER NOT NULL, 
         REFERENCES Girls(girl_name)
         ON UPDATE CASCADE
         ON DELETE CASCADE);

The Pairs table allows us to insert rows like this:

('Joe Celko', 'Brooke Shields')
('Joe Celko', 'Kim Bassinger')
('Alec Baldwin', 'Kim Bassinger')
('Joe Celko', 'Brooke Shields')

Opps! I am shown twice with 'Brooke Shields' because the Pairs table does not have its own key. This is an easy mistake to make, but fixing it is not an obvious thing.

CREATE TABLE Orgy
(boy_name INTEGER NOT NULL

        REFERENCES Boys (boy_name)
        ON DELETE CASCADE 
        ON UPDATE CASCADE, 
 girl_name INTEGER NOT NULL, 
         REFERENCES Girls(girl_name)
         ON UPDATE CASCADE
         ON DELETE CASCADE, 

 PRIMARY KEY (boy_name, girl_name)); -- compound key

The Orgy table gets rid of the duplicated rows and makes this a proper table. The primary key for the table is made up of two or more columns and is called a compound key because of that fact.

('Joe Celko', 'Brooke Shields')
('Joe Celko', 'Kim Bassinger')
('Alec Baldwin', 'Kim Bassinger')

But the only restriction on the pairs is that they appear only once. Every boy can be paired with every girl, much to the dismay of the Moral Majority. I think I want to make a rule that guys can have as many gals as they want, but the gals have to stick to one guy.

The way I do this is to use a NOT NULL UNIQUE constraint on the girl_name column, which makes it a key. It is a simple key since it is only one column, but it is also a nested key because it appears as a subset of the compound PRIMARY KEY.

CREATE TABLE Playboys
(boy_name INTEGER NOT NULL

        REFERENCES Boys (boy_name)
        ON UPDATE CASCADE
        ON DELETE CASCADE, 
 girl_name INTEGER NOT NULL UNIQUE, -- nested key 
         REFERENCES Girls(girl_name)
         ON UPDATE CASCADE
         ON DELETE CASCADE, 

 PRIMARY KEY (boy_name, girl_name)); -- compound key

The Playboys is a proper table, without duplicated, but it also enforces the condition that I get to play around with one or more ladies, thus.

('Joe Celko', 'Brooke Shields')
('Joe Celko', 'Kim Bassinger')

The ladies might want to go the other way and keep company with a series of men.

CREATE TABLE Playgirls
(boy_name INTEGER NOT NULL UNIQUE -- nested key

        REFERENCES Boys (boy_name)
        ON UPDATE CASCADE
        ON DELETE CASCADE, 
 girl_name INTEGER NOT NULL, 
         REFERENCES Girls(girl_name)
         ON UPDATE CASCADE
         ON DELETE CASCADE, 

 PRIMARY KEY (boy_name, girl_name)); -- compound key

The Playgirls table would permit these rows from our original set.

('Joe Celko', 'Kim Bassinger')
('Alec Baldwin', 'Kim Bassinger')

The Moral majority is pretty upset about this Hollywood scandal and would love for us to stop running around and settle down in nice stable couples.

CREATE TABLE Couples
(boy_name INTEGER NOT NULL UNIQUE -- nested key

        REFERENCES Boys (boy_name)
        ON UPDATE CASCADE
        ON DELETE CASCADE, 
 girl_name INTEGER NOT NULL UNIQUE   -- nested key, 
         REFERENCES Girls(girl_name)
         ON UPDATE CASCADE
         ON DELETE CASCADE, 

 PRIMARY KEY(boy_name, girl_name)); -- compound key

The Couples table allows us to insert these rows from the original set.

('Joe Celko', 'Brooke Shields')
('Alec Baldwin', 'Kim Bassinger')

Think about this table for a minute. The PRIMARY KEY is now redundant. If each boy appears only once in the table and each girl appears only once in the table, then each (boy_name, girl_name) pair can appear only once.

From a theoretical viewpoint, I could drop the compound key and make either boy_name or girl_name the new primary key, or I could just leave them as candidate keys. However, SQL products and theory do not always match. Many products make the assumption that the PRIMARY KEY is in some way special in the data model and will be the way that they should access the table most of the time.

In fairness, making special provision for the primary key is not a bad assumption because the REFERENCES clause uses the PRIMARY KEY of the referenced table as a default. Many programmers are not aware that a FOREIGN KEY constraint can also reference any UNIQUE constraint in the same table or in another table. The following nightmare will give you an idea of the possibilities. The multiple column versions follow the same syntax.

CREATE TABLE Foo
(foo_key INTEGER NOT NULL PRIMARY KEY,

  ...
 self_ref INTEGER NOT NULL

          REFERENCES Foo(fookey),
 outside_ref_1 INTEGER NOT NULL

             REFERENCES Bar(bar_key),
 outside_ref_2 INTEGER NOT NULL

             REFERENCES Bar(other_key),
 ...);

CREATE TABLE Bar
(bar_key INTEGER NOT NULL PRIMARY KEY,

 other_key INTEGER NOT NULL UNIQUE,
 ...);

But getting back to the nested keys, just how far can we go with them?  My favorite example is a teacher's schedule kept in a table like this
(I am leaving off reference clauses and CHECK() constraints):

CREATE TABLE Schedule
(teacher VARCHAR(15) NOT NULL,

 class CHAR(15) NOT NULL,
 room INTEGER NOT NULL,
 period INTEGER NOT NULL,
 PRIMARY KEY (teacher, class, room, period));

That choice of a primary key is the most obvious one -- use all the columns. Typical rows would look like this:

 ('Mr. Celko', 'Database 101', 222, 6)

The rules we want to enforce are:

  1. A teacher is in only one room each period.
  2. A teacher teaches only one class each period.
  3. A room has only one class each period.
  4. A room has only one teacher in it each period.

Stop reading and see what you come up with for an answer. Okay, now consider using one constraint for each rule in the list, thus.

CREATE TABLE Schedule_1 -- version one, wrong!
(teacher VARCHAR(15) NOT NULL,

 class CHAR(15) NOT NULL,
 room INTEGER NOT NULL,
 period INTEGER NOT NULL,

 UNIQUE (teacher, room, period), -- rule #1 
 UNIQUE (teacher, class, period), -- rule #2 
 UNIQUE (class, room, period),  -- rule #3 
 UNIQUE (teacher, room, period), -- rule #4 
 PRIMARY KEY (teacher, class, room, period));

We know that there are 4 ways to pick three things from a set of four things permutation. While column order is important in creating an index, we can ignore it for now and then worry about index tuning later in the book.

I could drop the PRIMARY KEY as redundant if I have all four of these constraints in place. But what happens if I drop the PRIMARY KEY and then one of the constraints?

CREATE TABLE Schedule_2 -- still wrong
(teacher VARCHAR(15) NOT NULL,

 class CHAR(15) NOT NULL,
 room INTEGER NOT NULL,
 period INTEGER NOT NULL,

 UNIQUE (teacher, room, period), -- rule #1 
 UNIQUE (teacher, class, period), -- rule #2 
 UNIQUE (class, room, period));    -- rule #3 

I can now insert these rows in the second version of the table:

 ('Mr. Celko', 'Database 101', 222, 6)
 ('Mr. Celko', 'Database 102', 223, 6)

This gives me a very tough sixth period class load since I have to be in two different rooms at the same time. Things can get even worse when another teacher is added to the schedule:

 ('Mr. Celko', 'Database 101', 222, 6)
 ('Mr. Celko', 'Database 102', 223, 6)
 ('Ms. Shields', 'Database 101', 223, 6)

Ms. Shields and I are both in room 223, trying to teach different classes at the same time. Matthew Burr looked at the constraints and the rules, came up with this analysis.

CREATE TABLE Schedule_3 -- corrected version
(teacher VARCHAR(15) NOT NULL,

 class CHAR(15) NOT NULL,
 room INTEGER NOT NULL,
 period INTEGER NOT NULL,
 UNIQUE (teacher, period), -- rules #1 and #2  UNIQUE (room, period)); -- rules #3 and #4

If a teacher is in only one room each period, then given a period and a teacher I should be able to determine only one room, i.e. room is functionally dependent upon the combination of teacher and period. Likewise, if a teacher teaches only one class each period, then class is functionally dependent upon the combination of teacher and period. The same thinking holds for the last two rules: class is functionally dependent upon the combination of room and period, and teacher is functionally dependent upon the combination of room and period.

With the constraints that were provided in the first version, you will find that the rules are not enforced. For example, I could enter the following rows:

 ('Mr. Celko', 'Database 101', 222, 6)
 ('Mr. Celko', 'Database 102', 223, 6)

These rows violate two of your rules, rule #1 and rule #2. However, the unique constraints first provided in Schedule_2 do not capture this violation and will allow the rows to be entered.

The constraint

  UNIQUE (teacher, room, period)  

is checking the complete combination of teacher, room, and period, and since ('Mr. Celko', 222, 6) is different from ('Mr. Celko', 223, 6), the DDL does not find any problem with both rows being entered, even though that means that Mr. Celko is in more than one room during the same period.

  UNIQUE (teacher, class, period)

doesn't catch its associated rule either since ('Mr. Celko', 'Database 101', 6) is different from ('Mr. Celko', 'Database 102', 6), and so, Mr. Celko is able to teach more than one class during the same period, thus violating rule two. It seems that we'd also be able to add the following row:

('Ms. Shields', 'Database 103', 222, 6)

which violates rules #3 and #4. Received on Sun Jun 27 2004 - 06:14:31 CEST

Original text of this message