Re: Primary Keys and Foreign Keys

From: Joe Celko <71062.1056_at_compuserve.com>
Date: Fri, 15 Dec 2000 19:10:18 GMT
Message-ID: <91dqaj$cn8$1_at_nnrp1.deja.com>


>> Followup to the followup: I've looked pretty hard but have been unable to find clear guidance for indexing on many-many tables. <<

I am not sure if this will help, but here is the text of a column of mine that will appear in the4 INTELLIGENT ENTERPRISE website soon:

One of the basic tricks in SQL is how you represent a many-to-many relationship. You create a third table which references the two (or more) tables involved by their primary keys. This third table seems to have a bunch of popular names such as "junction table" or "join table", but I know that it is a relationship. People tell you this, then leave you on your own to figure the rest of it.

For example given two tables,

CREATE TABLE Boys
(boy_name VARCHAR(30) NOT NULL PRIMARY KEY
 ...);

CREATE TABLE Girls
(girl_name VARCHAR(30) NOT NULL PRIMARY KEY,
 ... );

Yes, I know using names for a key is a bad practice, but it will make my examples easier to read. There are a lot of different many-to-many relationships that we can make between these two tables. If you don't believe me, just watch the Jerry Springer show sometime. The simplest relationship table looks like this:

CREATE TABLE Pairs
(boy_name INTEGER NOT NULL

        REFERENCES Boys (boy_name),
 girl_name INTEGER NOT NULL,

         REFERENCES Girls(girl_name));

The Pairs table allows us to insert rows like this:

('Joe Celko', 'Brooke Shields')
('Joe Celko', 'Kim Bassinger')
('Alec Baldwin', 'Kim Bassinger')
('Joe Celko', 'Brooke Shields')

Opps! I am shown twice with 'Brooke Shields' because the Pairs table does not have its own key. This is an easy mistake to make, but fixing it is not an obvious thing.

CREATE TABLE Orgy
(boy_name INTEGER NOT NULL

        REFERENCES Boys (boy_name),
 girl_name INTEGER NOT NULL,

         REFERENCES Girls(girl_name),
 PRIMARY KEY (boy_name, girl_name)); -- compound key

The Orgy table gets rid of the duplicated rows and makes this a proper table. The primary key for the table is made up of two or more columns and is called a compound key because of that fact.

('Joe Celko', 'Brooke Shields')
('Joe Celko', 'Kim Bassinger')
('Alec Baldwin', 'Kim Bassinger')

But the only restriction on the pairs is that they appear only once. Every boy can be paired with every girl, much to the dismay of the Moral Majority. I think I want to make a rule that guys can have as many gals as they want, but the gals have to stick to one guy.

The way I do this is to use a NOT NULL UNIQUE constraint on the girl_name column, which makes it a key. It is a simple key since it is only one column, but it is also a nested key because it appears as a subset of the compound PRIMARY KEY.

CREATE TABLE Playboys
(boy_name INTEGER NOT NULL

        REFERENCES Boys (boy_name),
 girl_name INTEGER NOT NULL UNIQUE, -- nested key

         REFERENCES Girls(girl_name),
 PRIMARY KEY (boy_name, girl_name)); -- compound key

The Playboys is a proper table, without duplicated, but it also enforces the condition that I get to play around with one or ladies

('Joe Celko', 'Brooke Shields')
('Joe Celko', 'Kim Bassinger')

The ladies might want to go the other way and keep company with a series of men.

CREATE TABLE Playgirls
(boy_name INTEGER NOT NULL UNIQUE -- nested key

        REFERENCES Boys (boy_name),
 girl_name INTEGER NOT NULL,

         REFERENCES Girls(girl_name),
 PRIMARY KEY (boy_name, girl_name)); -- compound key

The Playgirls table would permit these rows from our original set. .

('Joe Celko', 'Kim Bassinger')
('Alec Baldwin', 'Kim Bassinger')

The Moral majority is pretty upset about this Hollywood scandal and would love for us to stop running around and settle down in nice stable couples.

CREATE TABLE Couples
(boy_name INTEGER NOT NULL UNIQUE -- nested key

        REFERENCES Boys (boy_name),
 girl_name INTEGER NOT NULL UNIQUE -- nested key,

         REFERENCES Girls(girl_name),
 PRIMARY KEY(boy_name, girl_name)); -- compound key

The Couples table allows us to insert these rows from the original set.

('Joe Celko', 'Brooke Shields')
('Alec Baldwin', 'Kim Bassinger')

Think about this table for a minute. The PRIMARY KEY is now redundant. If each boy appears only once in the table and each girl appears only once in the table, then each (boy_name, girl_name) pair can appear only once.

From a theoretical viewpoint, I could drop the compound key and make either boy_name or girl_name the new primary key, or I could just leave them as candidate keys.

SQL products and theory do not always match. Many products make the assumption that the PRIMARY KEY is in some way special in the data model and will be the way that they should access the table most of the time.

In fairness, making special provision for the primary key is not a bad assumption because the REFERENCES clause uses the PRIMARY KEY of the referenced table as a default. Many programmers are not aware that a FOREIGN KEY constraint can also reference any UNIQUE constraint in the same table or in another table. The following nightmare will give you an idea of the possibilities. The multiple column versions follow the same syntax.

CREATE TABLE Foo
(foo_key INTEGER NOT NULL PRIMARY KEY,

  ...
 self_ref INTEGER NOT NULL

          REFERENCES Foo(fookey),
 outside_ref_1 INTEGER NOT NULL

             REFERENCES Bar(bar_key),
 outside_ref_2 INTEGER NOT NULL

             REFERENCES Bar(other_key),
 ...);

CREATE TABLE Bar
(bar_key INTEGER NOT NULL PRIMARY KEY,

 other_key INTEGER NOT NULL UNIQUE,
 ...);

But getting back to the nested keys, just how far can we go with them? My favorite example is a teacher's schedule kept in a table like this
(I am leaving off reference clauses and CHECK() constraints):

CREATE TABLE Schedule
(teacher VARCHAR(15) NOT NULL,

 class CHAR(15) NOT NULL,
 room INTEGER NOT NULL,
 period INTEGER NOT NULL,
 PRIMARY KEY (teacher, class, room, period));

That choice of a primary key is the most obvious one -- use all the columns. Typical rows would look like this:  ('Mr. Celko', 'Database 101', 222, 6)

The rules we want to enforce are:

  1. A teacher is in only one room each period.
  2. A teacher teaches only one class each period.
  3. A room has only one class each period.
  4. A room has only one teacher in it each period.

Stop reading and see what you come up with for an answer.

CREATE TABLE Schedule
(teacher VARCHAR(15) NOT NULL,

 class CHAR(15) NOT NULL,
 room INTEGER NOT NULL,
 period INTEGER NOT NULL,

 UNIQUE (teacher, room, period), -- rule #1
 UNIQUE (teacher, class, period), -- rule #2
 UNIQUE (class, room, period),  -- rule #3
 UNIQUE (teacher, room, period), -- rule #4
 PRIMARY KEY (teacher, class, room, period));

We know that there are 24 ways to pick three things from a set of four things in an ordered sequence; this is called a permutation. If order does not matter, then you have a combination and there are only four subsets, all of which we have used in the UNIQUE constraints. While column order is important in creating an index, we can ignore it for now and then worry about index tuning later.

I probably want to drop the PRIMARY KEY as redundant if I have all four of these constraints in place. But what happens if I drop the PRIMARY KEY and then one of the constraints?

CREATE TABLE Schedule-2
(teacher VARCHAR(15) NOT NULL,

 class CHAR(15) NOT NULL,
 room INTEGER NOT NULL,
 period INTEGER NOT NULL,

 UNIQUE (teacher, room, period), -- rule #1
 UNIQUE (teacher, class, period), -- rule #2
 UNIQUE (class, room, period));    -- rule #3

I can now insert these rows in the second version of the table:

 ('Mr. Celko', 'Database 101', 222, 6)
 ('Mr. Celko', 'Database 102', 223, 6)

This gives me a very tough sixth period class load since I have to be in two different rooms at the same time. Things can get even worse when another teacher is added to the schedule:

 ('Mr. Celko', 'Database 101', 222, 6)
 ('Mr. Celko', 'Database 102', 223, 6)
 ('Ms. Shields', 'Database 101', 223, 6)

Ms. Shields and I are both in room 223, trying to teach different classes at the same time.

--CELKO--
Joe Celko, SQL Guru & DBA at Trilogy
When posting, inclusion of SQL (CREATE TABLE ..., INSERT ..., etc) which can be cut and pasted into Query Analyzer is appreciated.

Sent via Deja.com
http://www.deja.com/ Received on Fri Dec 15 2000 - 20:10:18 CET

Original text of this message