Re: dbdebunk 'Quote of Week' comment
Date: Tue, 23 Aug 2005 00:20:09 +0200
Message-ID: <bpgkg11qccbksdrggs7jeas55vg5noi2iq_at_4ax.com>
If you use the undocumented DBCC PAGE command to inspect how the data is laid out on the disk, you'll find that the row with ID = 2 comes first, and the row with ID = 1 comes second.
>More and more programmers who have absolutely no database training are
>being told to design a database.
(snip)
>Most of them simply start programming SQL as if it were their native
>programming language. They grab at GUIDs, IDENTITY, ROWID and other
>proprietary auto-numbering "features" in SQL products to imitate either
>a record number (sequential file system mindset) or OID (OO mindset)
>since they don't know anything else.
Yes, that's what newbies often do. They fail to see when to use and especially when not to use an IDENTITY. They think that IDENTITY can be used as the only key in a table. Of course, experienced database designers like you know better - you know that the choice to use a surrogate key requires you to include the real key as well, and to include a constraint that will ensure uniqueness of the real key.
BTW, newbies also abuse CHARACTER columns to store formatted dates. Does
this mean that the CHARACTER datatype is inherently flawed? Or does this
only mean that almost any feature of SQL can be abused by stupid people?
>Experienced database designers look for industry standard codes for
Indeed, they do. And if performance is important, or if there is a fair
chance that the industry standard code might change one day (longer UPC
code, ZIP+, legal issues requiring you to stop using SSN, etc) and they
don't want to accept the downtime to propagate such a time to all their
billion-row tables, they will add a suitable surrogate key.
>Try to tell the IRS you don't have anybody's SSN at
The essence of a surrogate key is that acts as a surrogate for the real
key. The real key is still in the entities' own table, but the surrogate
is used to refer to the row in foreign key relationships.
Cindy Lou Who's SSN will be in the Personnel tabel. But in the
Gratifications table, you'll only find the number 42.
>But let's look at the logical problems. First try to create a table
Correct. IDENTITY is not a data type. It's a property. Look at the table
definition above - the data type is INTEGER.
The identity property is a generator for surrogate keys - no point in
having two surrogate keys in the same table.
>Next, create a table with one column and make it an IDENTITY. Now try
If you have only a surrogate key column but no other column, then what
is the surrogate key a surrogate for? Indeed - that is not a table.
>Finally create a simple table with one IDENTITY and a few other
Incidentally, yes. But note that it also behaves exactly the way Dr.Codd
said surrogate keys should behave: generated by the system, and users
have no control over it's value.
> A utility
You can't, since SQL Server doesn't allow you to change the values in an
IDENTITY column. Remember: "Database users may cause the system to
generate or delete a surrogate, but they have no control over its value"
>But now use a statement with a query expression in it, like this:
Who cares? As long as they're unique, they serve their purpose.
>The answer has been to use whatever the *PHYSICAL* order of the result
So if MS decides for it's next release to use a randomizer to reshuffle
the identity values assigned before storing them, you wouldn't object?
>But it is actually worse than that. If the same query is executed
So? Assuming you did not forget to declare the real key as either
PRIMARY KEY or UNIQUE, executing the same query again would simply
result in a constraint violation, and no rows inserted into the table.
>Can you explain from a logical model why the same rows in the second
Yes, as a matter of fact I can. To quote Dr.Codd: "Database users (...)
have no control over its value".
>How do you verify that an entity has the right key when you use an
>If I lean on a mouse button, I can insert the same data with a new
So you DID forget to include the real key in the table? Or to include
the appropriate constraint?
Oh dear.
> Cindy Lou Who is now employed two times and
Eh? Since when is a UNIQUE constraint procedural?
And the IDENTITY is not made redundant if you declare the SSN to be
UNIQUE (like you should have done in the first palce) - it's a surrogate
key; it serves a different purpose. The SSN is there to relate the row
in the Personnel table to the employee in front of your desk. The
IDENTITY value is there to relate rows in other tables to the row in the
Personnel table.
> But we were assuming that we use only
Not "we" were - you were assuming that. You are collecting examples of
bad code involving IDENTITY and using that to say that IDENTITY is bad.
But you ignore examples of code where the IDENTITY property is used
without any problems.
It's like saying that cars should be abolished because stupid people
drive drunk and cause accidents.
>Another cute way to destroy data integrity:
Eh? I see two possibilities:
In short - the effect is no different from executing the same commands
against a table without surrogate key.
>Finally, an appeal to authority, with a quote from Dr. Codd:
Indeed.
> That means never
But here, you miss the mark. If I write a query to join two tables on a
surrogate key column, do I see any values? Heck, no!
If I write a FOREIGN KEY constraint, do I see any value? Heck, no!
>See this article for problems in replication with IDENTITY columns.
Did you actually read the article?
The article mentions three problems.
"The third problem is typically caused when all of the columns
in a row are updated by the application, including the identity
column."
UPDATE Foobar
>their keys first.
>tax time but Cindy Lou Who was employee 42 when you put her into the
>Personnel table and you have the IDENTITY value to prove it.
(more of these examples snipped)
>with two columns and try to make them both IDENTITY. If you cannot
>declare more than one column to be of a certain data type, then that
>thing is not a data type at all, by definition. It is a property which
>belongs to the PHYSICAL table, not the LOGICAL data in the table.
>to insert, update and delete different numbers from it. If you cannot
>insert, update and delete rows from a table, then it is not a table by
>definition.
>columns. Use a few statements like
>
>INSERT INTO Foobar (a, b, c) VALUES ('a1', 'b1', 'c1');
>INSERT INTO Foobar (a, b, c) VALUES ('a2', 'b2', 'c2');
>INSERT INTO Foobar (a, b, c) VALUES ('a3', 'b3', 'c3');
>
>To put a few rows into the table and notice that the IDENTITY
>sequentially numbered them in the PHYSICAL order they were presented.
>If you delete a row, the gap in the sequence is not filled in and the
>sequence continues from the highest number that has ever been used in
>that column in that particular table. This is how we did record numbers
>in pre-allocated sequential files in the 1950's, by the way.
>program would then "pack" or "compress" the records that were flagged
>as deleted or unused to move the empty space to the PHYSICAL end of the
>PHYSICAL file. IDENTITY leaves the gaps unless you write your own
>routine to do the compression.
>
>INSERT INTO Foobar (a, b, c)
>SELECT x, y, z
>FROM Floob;
>
>Since a query result is a table, and a table is a set which has no
>ordering, what should the IDENTITY numbers be? The entire, whole,
>completed set is presented to Foobar all at once, not a row at a time.
>There are (n!) ways to number (n) rows, so which one did you pick? Why?
>set happened to be. That non-relational phrase "PHYSICAL order" again!
>again, but with new statistics or after an index has been dropped or
>added, the new execution plan could bring the result set back in a
>different PHYSICAL order. Indexes and statistics are not part of the
>logical model.
>query get different IDENTITY numbers? In the relational model, they
>should be treated the same if all the values of all the attributes are
>identical.
>IDENTITY?
By using the real key. As I said before: experienced database designers
will never forget to include the real key along with the surrogate key;
that's a mistake only newbies make.
>IDENTITY over and over.
>none of my reports are right! Now I have to write some procedural code
>like a trigger or a UNIQUE constraint on her SSN to prevent this, thus
>making the IDENTITY redundant.
>IDENTITY as a key, so we are screwed.
>
>BEGIN ATOMIC
>DELETE FROM Foobar
>WHERE id = <<some identity value>>;
>INSERT INTO Foobar
>VALUES (<<recreate deleted row>>)
>END;
>
>Logically this should do nothing, but since IDENTITY has gaps, it
>trashes the data.
>"..Database users may cause the system to generate or delete a
>surrogate, but they have no control over its value, nor is its value
>ever displayed to them ..."(Dr. Codd in ACM TODS, pp 409-410) and Codd,
>E. (1979), Extending the database relational model to capture more
>meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434.
>
>This means that a surrogate ought to act like an index; created by the
>user, managed by the system and NEVER seen by a user.
>used in queries, DRI or anything else that a user does.
>http://www.simple-talk.com/2005/07/05/replication/
For the first two, a repro script is given - but if you scroll down,
you'll see that there are also easy solutions supplied for these two
problems.
For the third problem, no solution is given. But there's allso no repro
script. Which is not surprising, when you read:
Have you ever tried to update an identity column in SQL Server? Run the
CREATE TABLE and INSERT statements above, then this:
SET ID = 8,
Name = 'John'
AND Name = 'Joe'
The result is
Server: Msg 8102, Level 16, State 1, Line 1 Cannot update identity column 'ID'.
Best, Hugo
-- (Remove _NO_ and _SPAM_ to get my e-mail address)Received on Tue Aug 23 2005 - 00:20:09 CEST