Re: Primary-key ... alternate key ... (newbie question)

From: --CELKO-- <71062.1056_at_compuserve.com>
Date: 4 Oct 2002 12:01:30 -0700
Message-ID: <c0d87ec0.0210041101.34834160_at_posting.google.com>


>> In many normalisation articles, they sometimes mention to mark the alternate
key's. Why is it important to 'mark' these? Where are they used during the normalisation-steps? <<

If a uniqueness relationship exists in your data model, then you must show it because it is the truth! That is very simple.

There is a story about the whole PRIMARY KEY thing that we don't like to talk about. In his early papers, Dr. Codd created the idea of a PRIMARY KEY, which was one of several candidate keys that was picked for this honor. Later, he realized that he was still being influenced by sequential file systems (think of a magnetic tape), which must have a single key that is used to sort the file. All operations in a sequential file system depend on (1) sorted order and (2) record position within the file. The relational model does not have these limits.

The current RM says a key is a key and nobody is special.

However, this insight was too late for SQL-86. We inherited the PRIMARY KEY concept, and several SQL products made special PHYSICAL allowances for it in their access methods and storage. It is the default for FOREIGN KEY references in SQL-92 and cannot be NULL.

Here is an old article I did for INTELLIGENT ENTERPRISE magazine

http://www.intelligententerprise.com/010101/celko.shtml


One of the basic tricks in SQL is representing a many-to-many relationship. You create a third table that references the two (or more) tables involved by their primary keys. This third table has quite a few popular names, such as "junction table" or "join table," but I know that it is a relationship. People tell you this, then leave you on your own to figure out the rest.

For example, here are two tables:

CREATE TABLE Boys
(boy_name VARCHAR(30) NOT NULL PRIMARY KEY
...);
CREATE TABLE Girls
(girl_name VARCHAR(30) NOT NULL PRIMARY KEY,
... );

Yes, I know using names for a key is a bad practice, but doing so will make my examples easier to read. You can make a lot of different many-to-many relationships between these two tables. (If you don't believe me, just watch the Jerry Springer show some time.) The simplest relationship table looks like this:

CREATE TABLE Pairs
(boy_name INTEGER NOT NULL

REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL,
REFERENCES Girls(girl_name));
The "Pairs" table let&#8217;s you insert rows like this:
('Joe Celko', 'Brooke Shields')
('Joe Celko', 'Kim Bassinger')
('Alec Baldwin', 'Kim Bassinger')
('Joe Celko', 'Brooke Shields')

Oops! I am shown twice with 'Brooke Shields' because the "Pairs" table does not have its own key. This mistake is easy to make, but how to fix it is not always obvious.

The "Orgy" table gets rid of the duplicated rows and makes this a proper table:

CREATE TABLE Orgy
(boy_name INTEGER NOT NULL

REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL,
REFERENCES Girls(girl_name),
PRIMARY KEY (boy_name, girl_name)); -- compound key

The primary key for the table comprises two or more columns and is called a compound key because of that fact.

('Joe Celko', 'Brooke Shields')
('Joe Celko', 'Kim Bassinger')
('Alec Baldwin', 'Kim Bassinger')

But the only restriction on the pairs is that they appear only once. Every boy can be paired with every girl, much to the dismay of the moral majority. Now, I want to make a rule that guys can have as many gals as they want, but the gals have to stick to one guy.

The way I do this is to use a NOT NULL UNIQUE constraint on the girl_name column, which makes it a key. It&#8217;s a simple key because it is only one column, but it is also a nested key because it appears as a subset of the compound PRIMARY KEY.

"Playboys" is a proper table, without duplicated pairs, but it also enforces the condition that I get to play around with one or more ladies:

CREATE TABLE Playboys
(boy_name INTEGER NOT NULL

REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL UNIQUE, -- nested key REFERENCES Girls(girl_name),
PRIMARY KEY (boy_name, girl_name)); -- compound key

('Joe Celko', 'Brooke Shields')
('Joe Celko', 'Kim Bassinger')

CREATE TABLE Playgirls
(boy_name INTEGER NOT NULL UNIQUE -- nested key
REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL,
REFERENCES Girls(girl_name),
PRIMARY KEY (boy_name, girl_name)); -- compound key

The Playgirls table would permit these rows from our original set:

('Joe Celko', 'Kim Bassinger')
('Alec Baldwin', 'Kim Bassinger')

The moral majority is pretty upset about this Hollywood scandal and would love for us to stop running around and settle down in nice, stable couples:

CREATE TABLE Couples
(boy_name INTEGER NOT NULL UNIQUE -- nested key
REFERENCES Boys (boy_name),
girl_name INTEGER NOT NULL UNIQUE -- nested key, REFERENCES Girls(girl_name),
PRIMARY KEY(boy_name, girl_name)); -- compound key The "Couples" table lets you insert these rows from the original set:

('Joe Celko', 'Brooke Shields')
('Alec Baldwin', 'Kim Bassinger')

Think about this table for a minute. The PRIMARY KEY is now redundant. If each boy appears only once in the table and each girl appears only once in the table, then each (boy_name, girl_name) pair can appear only once.

From a theoretical viewpoint, I could drop the compound key and make either boy_name or girl_name the new primary key, or I could just leave them as candidate keys.

SQL products and theory do not always match. Many products make the assumption that the PRIMARY KEY is in some way special in the data model and will be the way to access the table most of the time.

In fairness, making a special provision for the primary key is not a bad assumption because the REFERENCES clause uses the PRIMARY KEY of the referenced table as a default. Many programmers are not aware that a FOREIGN KEY constraint can also reference any UNIQUE constraint in the same table or in another table. The following nightmare will give you an idea of the possibilities. The multiple column versions follow the same syntax.

CREATE TABLE Foo
(foo_key INTEGER NOT NULL PRIMARY KEY,
...
self_ref INTEGER NOT NULL
REFERENCES Foo(fookey),
outside_ref_1 INTEGER NOT NULL
REFERENCES Bar(bar_key),
outside_ref_2 INTEGER NOT NULL
REFERENCES Bar(other_key),
...);
CREATE TABLE Bar
(bar_key INTEGER NOT NULL PRIMARY KEY,
other_key INTEGER NOT NULL UNIQUE,
...);

But getting back to the nested keys, just how far can you go with them? My favorite example is a teacher's schedule kept in a table like this (I am leaving off reference clauses and CHECK() constraints):

CREATE TABLE Schedule
(teacher VARCHAR(15) NOT NULL,

class CHAR(15) NOT NULL,
room INTEGER NOT NULL,
period INTEGER NOT NULL,
PRIMARY KEY (teacher, class, room, period));

That choice of a primary key is the most obvious one -- use all the columns. Typical rows would look like this: ('Mr. Celko', 'Database 101', 222, 6) The rules you want to enforce are:

A teacher is in only one room each period A teacher teaches only one class each period A room has only one class each period
A room has only one teacher in it each period.

Stop reading and see what you come up with for an answer.

CREATE TABLE Schedule
(teacher VARCHAR(15) NOT NULL,

class CHAR(15) NOT NULL,
room INTEGER NOT NULL,
period INTEGER NOT NULL,

UNIQUE (teacher, room, period), -- rule #1
UNIQUE (teacher, class, period), -- rule #2
UNIQUE (class, room, period), -- rule #3
UNIQUE (teacher, room, period), -- rule #4
PRIMARY KEY (teacher, class, room, period));

You know that you have 24 ways to pick three objects from a set of four in an ordered sequence (permutation). If order does not matter, then you have a combination and only four subsets, all of which I have used in the UNIQUE constraints. Although column order is important in creating an index, you can ignore it for now and then worry about index tuning later.

I probably want to drop the PRIMARY KEY as redundant if I have all four of these constraints in place. But what happens if I drop the PRIMARY KEY and then one of the constraints?

CREATE TABLE Schedule-2
(teacher VARCHAR(15) NOT NULL,

class CHAR(15) NOT NULL,
room INTEGER NOT NULL,
period INTEGER NOT NULL,

UNIQUE (teacher, room, period), -- rule #1
UNIQUE (teacher, class, period), -- rule #2
UNIQUE (class, room, period)); -- rule #3

I can now insert these rows in the second version of the table:

('Mr. Celko', 'Database 101', 222, 6)
('Mr. Celko', 'Database 102', 223, 6)

This gives me a very tough sixth period class load because I have to be in two different rooms at the same time. Things can get even worse when another teacher is added to the schedule:

('Mr. Celko', 'Database 101', 222, 6)
('Mr. Celko', 'Database 102', 223, 6)
('Ms. Shields', 'Database 101', 223, 6)

Ms. Shields and I are both in room 223, trying to teach different classes at the same time. I think you get the idea that a relationship table is not simple. And I did not even get into referential actions and the fact that a relationship can have its own attributes apart from those of its participants. Received on Fri Oct 04 2002 - 21:01:30 CEST

Original text of this message