Re: Primary Key and autonumber

From: VincentPTillman <VincentPTillman_at_email.msn.com>
Date: 2000/01/08
Message-ID: <#nOvIOjW$GA.271_at_cpmsnbbsa04>#1/1


  I assume the primary key that unique identifies your 1 million records does not already exist, or is a large character field (or a composite key) that you don't want to carry into child tables as a foreign key. This is called a surrogate key. When you add a surrogate key, still consider adding an alternate key for the row(s) that uniquely identify your data. When you allow duplicates into a large table with a sequentially-assigned surrogate key, it's really fun to remove the dups.

  If you database doesn't support sequences for use in a surrogate key, you can create your own sequence in memory (e.g., Oracle PL/SQL global package variable, but Oracle has sequences) or in a table (cache or pin the table in memory if at all possible). Also, place the logic that assigns the surrogate key in a trigger for extra robustness (otherwise, someone else could easily skip number or fail to read your home-grown sequence, which puts the sequence out of sync). If a trigger sets the surrogate key, the user doesn't know the value that got assigned, and may want to populate dependent tables. In this case, they need to re-select the data (using the real-world primary key, another reason to add it). When incrementing the sequence, each user ideally locks the table so others cannot even read it, and others must be forced to wait on the lock. A SELECT...FOR UPDATE may be sufficient if isolation levels that allow a dirty read are not supported or enabled.

   One note about selecting from "the whole 1 million rows" to get the next number: If you have an index (unique is even better), you won't scan the whole table. You should return the MAX(ID) very quickly, in most databases. However, this is a bad idea, even for small tables, even for indexed columns, even for unique indexes. I want to spell out the following "bad" approach:

  1. SELECT MAX(ID) FROM the table.
  2. Add one to the result in your code.
  3. Insert the new row with the resulting key value.

  I've seen production applications have all kinds of problems with this approach.

  First, I've seen no index on the primary key, or a simple index. If the ID column is not just indexed, but uniquely indexed, so an error returns when a value is re-used, duplicate keys will result. I have seen older databases that simply reject duplicates and proceed, even on a "unique" index, without an error. The uniqueness was simple their way of allowing you to not repeat yourself, rather than detect a problem.

  Second, adding a unique index that returns an error for duplicates doesn't solve the problem unless the programmers understand how to deal with duplicates. If they don't check for an error after the SELECT, or write a function to trap the error, and still proceed to add one to the result, they may continue through their program with a PK or 1, and attempt to insert child records for the wrong parent. This is a real mess to clean up.

  If you're going to design your own mechanism, I suggest you look at tried-and-true solutions before proceeding. And test a lot with programs in various sessions attempting quick simultaneous inserts. Mainframe VSAM systems have had to deal with this before relational databases, and had common problems. A simple service that provides "the next number," and "commits" an increment to the number before responding again, is ideal. The only problem with this approach is that numbers are skipped when the service is asked for a number but the number is never used in the table. If you're simple trying to add a unique identifier, this shouldn't matter.

Vincent Tillman
Senior Oracle DBA, OCP8

Francois Merle <leurmy_at_erols.com> wrote in message news:38735E63.581D1E1C_at_erols.com...
> Let's assume I have a database that has 1 million records, all uniquely
> identified with a Primary Key.
> If my database does not provide an Autonumber for the Primary Key, how
> can I efficiently give each new record a unique key without going
> through the entire database to check if that value has already been
> taken?
>
> Francois
>
>
>
Received on Sat Jan 08 2000 - 00:00:00 CET

Original text of this message