Re: Auto increment
Date: 10 May 2003 20:30:20 -0700
Message-ID: <c0d87ec0.0305101930.65eb8cca_at_posting.google.com>
Very bad. The IDENTITY column is a hold over from the early
programming language which were <i>very<i> close to the hardware. The
fields in a COBOL or FORTRAN program were assumed to be physically
located in main storage in the order and format in which they were
declared in the program. The records were accessed in contigous
physical secondary storage. The records had a physical position that
was used to locate them; there was a first, second,.. last record in a
file.
The early SQLs were based on existing file systems. The data was kept
in physically contiguous disk pages, in physically contiguous rows,
made up of physically contiguous columns. In short, just like a deck
of punch cards or a magnetic tape.
But physically contiguous storage is only one way of building a
relational database and it is not always the best one. But aside from
that, the whole idea of a relational database is that user is not
supposed to know how things are stored at all, much less write code
that depends on the particular physical representation in a particular
release of a particular product.
The practical considerations are that IDENTITY is proprietary and
non-portable, so you know that you will have maintenance problems when
you change releases or port your system to other products.
But let's look at the logical problems. First try to create a table
with two columns and try to make them both IDENTITY columns. If you
cannot declare more than one column to be of a certain datatype, then
that thing is not a datatype at all, by definition.
Next, create a table with one column and make it an IDENTITY column.
Now try to insert, update and delete different numbers from it. If
you cannot insert, update and delete rows from a table, then it is not
a table by definition.
Finally create a simple table with one IDENTITY column and a few other
columns. Use a few statements like
to put a few rows into the table and notice that the IDENTITY column
sequentially numbered them in the order they were presented. If you
delete a row, the gap in the sequence is not filled in and the
sequence continues from the highest number that has ever been used in
that column in that particular table.
But now use a statement with a query expression in it, like this:
INSERT INTO Foobar (a, b, c)
Since a query result is a table, and a table is a set which has no
ordering, what should the IDENTITY numbers be? The entire, whole,
completed set is presented to Foobar all at once, not a row at a time.
There are (n!) ways to number (n) rows, so which one do you pick?
The answer has been to use whatever the physical order of the result
set happened to be. That non-relational phrase "physical order"
again.
But it is actually worse than that. If the same query is executed
again, but with new statistics or after an index has been dropped or
added, the new execution plan could bring the result set back in a
different physical order. Can you explain from a logical model why
the same rows in the second query get different IDENTITY numbers? In
the relational model, they should be treated the same if all the
values of all the attributes are identical.
Think about trying to do replication on two databases that differ only
by an index, or by cache size or something that occasionally gives
them different execution plans for the same statements. Want to try
to maintain such a system?
The IDENTITY is used to mimic a magnetic tape position number.
There are better ways of creating identifiers.
The definition of a key is that it is a subset of attributes which
But again, how do you know which IDENTITY value matchs a subset of
columns? You can get a user to understand a natural, multi-column key
INSERT INTO Foobar (a, b, c) VALUES ('a1', 'b1', 'c1');
INSERT INTO Foobar (a, b, c) VALUES ('a2', 'b2', 'c2');
INSERT INTO Foobar (a, b, c) VALUES ('a3', 'b3', 'c3');
SELECT x, y, z
FROM Floob;
(say, (x,y) cooridnates, make model and year for a car, etc.); now
explain to him why 42 is the number of a ('Ford', 'Escort', 1997).
After making IDENTITY your key, you still have to add an extra UNIQUE() constraint for the mutliple column key. More storage.
What makes this argument even more foolish is that many OO programmer use a GUID which is MUCH longer than the multi-column key they seek to replace! r Received on Sun May 11 2003 - 05:30:20 CEST