ID field as logical address

From: Walter Mitty <wamitty_at_verizon.net>
Date: Fri, 29 May 2009 12:54:53 GMT
Message-ID: <xWQTl.1481$9L2.793_at_nwrddc02.gnilink.net>



In another thread, I suggested that prefixing every tuple with an auto-generated field (column) called ID amounted to reference by location rather than reference by content. Some people correctly point out to me that there is a level of indirection gained by using ID fields instead of pointers. That means the records (rows) "pointed to" by references to the ID are not pinned, and can still be shuffled. That's good, but it's trivial, IMO, when compared to the advantages claimed by Codd and refined by Date for the Relational model over alternative models. As far as I can see, a database where every foreign key refers to an auto-generated ID column is just a graph database disguised to look like a relational database.

In this thread, I want to further explore the use of ID fields, and approach the question of whether using ID fields is relevant to the discussion of whether an UPDATE is somehow different from a DELETE followed by an INSERT. I declined to engage Brian in this question when he posted a response in the earlier thread. I'm willing to dialogue about that here.

Let's say that the design has followed the practice of including an auto-generated key called ID in every table, and declaring the ID as the PRIMARY KEY. Let's further assume a practice that says that you never update the ID field of any tuple, once it's been assigned. Now let's say you have a table that records values that look like (ID, value). In an actual case, value would probably ityself be a tuple, so our tagged tuples might look like, for example, (ID, LastName, FirstName, PhoneNumber).

Now, if we update the LastName, the FirstName, and the Phone number of an entry, it's still the same entry, because the ID field retains the same value.

Now what happens if instead of updating the three dependent fields, we delete the row, and insert a new one. Here, I'm making an assumption about how auto-generate works. I'm assuming that, once used, and ID will never be auto-generated again, even if it's no longer in use. So, when we delete a row and insert a new row, we get a brand new ID. All of the references to the deleted row remain as orphans, unless they got deleted by a cascaded delete.

So, Brian's claim that deleting a value and inserting a new one is somehow really different from updating an existing value works in this model, if you regard ID as somehow "special", and not "part of the value recorded in the table".

Disclaimer: I use auto-generated keys only when there are no natural keys that I trust. And I tend to do the autogeneration in the application, and not in the table itself. I might use a sequence generator built into the database to obtain an unused value for a synthetic key, but I don't cause the DBMS to autogenerate new keys in the table. Received on Fri May 29 2009 - 14:54:53 CEST

Original text of this message