Re: Auto increment

From: shaun <nospam_at_nospam.com>
Date: Sun, 11 May 2003 14:05:29 +0100
Message-ID: <RSrva.19830$xd5.943672_at_stones.force9.net>


are there any situations where auton_increment is suitable?

"--CELKO--" <71062.1056_at_compuserve.com> wrote in message news:c0d87ec0.0305101930.65eb8cca_at_posting.google.com...
> Very bad. The IDENTITY column is a hold over from the early
> programming language which were <i>very<i> close to the hardware. The
> fields in a COBOL or FORTRAN program were assumed to be physically
> located in main storage in the order and format in which they were
> declared in the program. The records were accessed in contigous
> physical secondary storage. The records had a physical position that
> was used to locate them; there was a first, second,.. last record in a
> file.
>
> The early SQLs were based on existing file systems. The data was kept
> in physically contiguous disk pages, in physically contiguous rows,
> made up of physically contiguous columns. In short, just like a deck
> of punch cards or a magnetic tape.
>
> But physically contiguous storage is only one way of building a
> relational database and it is not always the best one. But aside from
> that, the whole idea of a relational database is that user is not
> supposed to know how things are stored at all, much less write code
> that depends on the particular physical representation in a particular
> release of a particular product.
>
> One of the biggest errors is the IDENTITY column in the Sybase family
> (SQL Server and Sybase). People actually program with this "feature"
> and even use it as the primary key for the table! Now, let's go into
> painful details as to why this thing is bad.
>
> The practical considerations are that IDENTITY is proprietary and
> non-portable, so you know that you will have maintenance problems when
> you change releases or port your system to other products.
>
> But let's look at the logical problems. First try to create a table
> with two columns and try to make them both IDENTITY columns. If you
> cannot declare more than one column to be of a certain datatype, then
> that thing is not a datatype at all, by definition.
>
> Next, create a table with one column and make it an IDENTITY column.
> Now try to insert, update and delete different numbers from it. If
> you cannot insert, update and delete rows from a table, then it is not
> a table by definition.
>
> Finally create a simple table with one IDENTITY column and a few other
> columns. Use a few statements like
>
> INSERT INTO Foobar (a, b, c) VALUES ('a1', 'b1', 'c1');
> INSERT INTO Foobar (a, b, c) VALUES ('a2', 'b2', 'c2');
> INSERT INTO Foobar (a, b, c) VALUES ('a3', 'b3', 'c3');
>
> to put a few rows into the table and notice that the IDENTITY column
> sequentially numbered them in the order they were presented. If you
> delete a row, the gap in the sequence is not filled in and the
> sequence continues from the highest number that has ever been used in
> that column in that particular table.
>
> But now use a statement with a query expression in it, like this:
>
> INSERT INTO Foobar (a, b, c)
> SELECT x, y, z
> FROM Floob;
>
> Since a query result is a table, and a table is a set which has no
> ordering, what should the IDENTITY numbers be? The entire, whole,
> completed set is presented to Foobar all at once, not a row at a time.
> There are (n!) ways to number (n) rows, so which one do you pick?
> The answer has been to use whatever the physical order of the result
> set happened to be. That non-relational phrase "physical order"
> again.
>
> But it is actually worse than that. If the same query is executed
> again, but with new statistics or after an index has been dropped or
> added, the new execution plan could bring the result set back in a
> different physical order. Can you explain from a logical model why
> the same rows in the second query get different IDENTITY numbers? In
> the relational model, they should be treated the same if all the
> values of all the attributes are identical.
>
> Think about trying to do replication on two databases that differ only
> by an index, or by cache size or something that occasionally gives
> them different execution plans for the same statements. Want to try
> to maintain such a system?
>
> The IDENTITY is used to mimic a magnetic tape position number.
>
> There are better ways of creating identifiers.
>
> The definition of a key is that it is a subset of attributes which
> uniquely identify that row. An IDENTITY is not an attribute, since it
> is part of the PHYSICAL implementation of the representation of the
> storage of the row. It does not exist in the real world of the model,
> so you can NEVER verify it!
>
> People who have never measured the actual performance, claim that
> using it as a key to replace a multi-column key. Adding extra data
> that has to be read from a disk is far more expensive than comparing a
> long byte string in main storage. Do the math: your CPU runs in
> nanoseconds; your disk drive runs in milliseconds.
>
> But again, how do you know which IDENTITY value matchs a subset of
> columns? You can get a user to understand a natural, multi-column key
> (say, (x,y) cooridnates, make model and year for a car, etc.); now
> explain to him why 42 is the number of a ('Ford', 'Escort', 1997).
>
> After making IDENTITY your key, you still have to add an extra
> UNIQUE() constraint for the mutliple column key. More storage.
>
> What makes this argument even more foolish is that many OO programmer
> use a GUID which is MUCH longer than the multi-column key they seek to
> replace! r
Received on Sun May 11 2003 - 15:05:29 CEST

Original text of this message