Re: Auto increment

From: Enu <>
Date: Mon, 12 May 2003 20:46:17 GMT
Message-ID: <>

I use auto-increment for keeping track of the order in which data was entered. Also, if you're creating an application from your database, it may be useful to have a unique identifying number for each record, so you don't need to send your whole multi-column key between different parts of the application.

Idealits, please refute.

On Sun, 11 May 2003 14:05:29 +0100, "shaun" <> wrote:

>are there any situations where auton_increment is suitable?
>"--CELKO--" <> wrote in message
>> Very bad. The IDENTITY column is a hold over from the early
>> programming language which were <i>very<i> close to the hardware. The
>> fields in a COBOL or FORTRAN program were assumed to be physically
>> located in main storage in the order and format in which they were
>> declared in the program. The records were accessed in contigous
>> physical secondary storage. The records had a physical position that
>> was used to locate them; there was a first, second,.. last record in a
>> file.
>> The early SQLs were based on existing file systems. The data was kept
>> in physically contiguous disk pages, in physically contiguous rows,
>> made up of physically contiguous columns. In short, just like a deck
>> of punch cards or a magnetic tape.
>> But physically contiguous storage is only one way of building a
>> relational database and it is not always the best one. But aside from
>> that, the whole idea of a relational database is that user is not
>> supposed to know how things are stored at all, much less write code
>> that depends on the particular physical representation in a particular
>> release of a particular product.
>> One of the biggest errors is the IDENTITY column in the Sybase family
>> (SQL Server and Sybase). People actually program with this "feature"
>> and even use it as the primary key for the table! Now, let's go into
>> painful details as to why this thing is bad.
>> The practical considerations are that IDENTITY is proprietary and
>> non-portable, so you know that you will have maintenance problems when
>> you change releases or port your system to other products.
>> But let's look at the logical problems. First try to create a table
>> with two columns and try to make them both IDENTITY columns. If you
>> cannot declare more than one column to be of a certain datatype, then
>> that thing is not a datatype at all, by definition.
>> Next, create a table with one column and make it an IDENTITY column.
>> Now try to insert, update and delete different numbers from it. If
>> you cannot insert, update and delete rows from a table, then it is not
>> a table by definition.
>> Finally create a simple table with one IDENTITY column and a few other
>> columns. Use a few statements like
>> INSERT INTO Foobar (a, b, c) VALUES ('a1', 'b1', 'c1');
>> INSERT INTO Foobar (a, b, c) VALUES ('a2', 'b2', 'c2');
>> INSERT INTO Foobar (a, b, c) VALUES ('a3', 'b3', 'c3');
>> to put a few rows into the table and notice that the IDENTITY column
>> sequentially numbered them in the order they were presented. If you
>> delete a row, the gap in the sequence is not filled in and the
>> sequence continues from the highest number that has ever been used in
>> that column in that particular table.
>> But now use a statement with a query expression in it, like this:
>> INSERT INTO Foobar (a, b, c)
>> SELECT x, y, z
>> FROM Floob;
>> Since a query result is a table, and a table is a set which has no
>> ordering, what should the IDENTITY numbers be? The entire, whole,
>> completed set is presented to Foobar all at once, not a row at a time.
>> There are (n!) ways to number (n) rows, so which one do you pick?
>> The answer has been to use whatever the physical order of the result
>> set happened to be. That non-relational phrase "physical order"
>> again.
>> But it is actually worse than that. If the same query is executed
>> again, but with new statistics or after an index has been dropped or
>> added, the new execution plan could bring the result set back in a
>> different physical order. Can you explain from a logical model why
>> the same rows in the second query get different IDENTITY numbers? In
>> the relational model, they should be treated the same if all the
>> values of all the attributes are identical.
>> Think about trying to do replication on two databases that differ only
>> by an index, or by cache size or something that occasionally gives
>> them different execution plans for the same statements. Want to try
>> to maintain such a system?
>> The IDENTITY is used to mimic a magnetic tape position number.
>> There are better ways of creating identifiers.
>> The definition of a key is that it is a subset of attributes which
>> uniquely identify that row. An IDENTITY is not an attribute, since it
>> is part of the PHYSICAL implementation of the representation of the
>> storage of the row. It does not exist in the real world of the model,
>> so you can NEVER verify it!
>> People who have never measured the actual performance, claim that
>> using it as a key to replace a multi-column key. Adding extra data
>> that has to be read from a disk is far more expensive than comparing a
>> long byte string in main storage. Do the math: your CPU runs in
>> nanoseconds; your disk drive runs in milliseconds.
>> But again, how do you know which IDENTITY value matchs a subset of
>> columns? You can get a user to understand a natural, multi-column key
>> (say, (x,y) cooridnates, make model and year for a car, etc.); now
>> explain to him why 42 is the number of a ('Ford', 'Escort', 1997).
>> After making IDENTITY your key, you still have to add an extra
>> UNIQUE() constraint for the mutliple column key. More storage.
>> What makes this argument even more foolish is that many OO programmer
>> use a GUID which is MUCH longer than the multi-column key they seek to
>> replace! r
Received on Mon May 12 2003 - 22:46:17 CEST

Original text of this message