Re: integer id columns for all tables

From: Alexandr Savinov <spam_at_conceptoriented.com>
Date: Tue, 29 Aug 2006 11:01:25 +0200
Message-ID: <ed0vpb$r3o$1_at_f1node01.rhrz.uni-bonn.de>


garhone schrieb:
> Hi,
>
> Someone with greater expertise than I recently told me that it is best
> to have an integer primary-key for all tables in a database, even if
> the table already has some non-integer primary key, or some sort of
> composite primary key. If there is already a non-integer key, then
> create a new sequence column and set that as the primary key.
>
> Does anyone have an opinion on this and why?

Here are a number of principles for designing identifiers:

Assume that there are two tables Companies and Products with many-to-many relationship implemented via CP table. When we create this join table, it automatically gets some identification means for its elements. Or this format for its identifiers has to be provided manually, for example, as some auto-integers (there are not properties yet). *After that* we can add some properties. In this example we want to add two fields the first pointing to a product and the second pointing to a company. (Notice again that we keep identifiers and properties separately.)

Alternatively, we might want to *optimize* this representation and combine two roles: identification and characterization. In this case we remove our identifier column and then say that the two properties <Company, Product> will identify each element. (Notice that the question is not if we need to *add* an integer identifier but rather if we can *remove* the existing identifier and use our properties instead of it.)

We hope that such records will take less space. It is not always so. The thing is that these records can be referenced by other records and then we need to store two fields instead of only one. Thus records themselves will really take less space but other records will need more space for longer identifiers.

Now assume that somebody wants to change one of the two properties. Here we need to answer the main question: are these columns identifiers or characteristics of the entity. Since we combined them, the answer is twofold. So every new developer of the system will give his own answer with the corresponding consequences.

Another problem is that it not an exception when two records with the same pair of <Company, Product> may exist. Indeed, why not? These are only properties and they can take any values. Or assume that in future this join table will also need to reference more records as its characteristics. Should I add them to the existing two-field identifiers or should I keep them separately. If I add them, then I will have to reorganize the whole model. If not, then for some unknown reasons two of three fields will be used for identification while the third will be not. It does not make sense because all the fields have equal role. Here again keeping separately properties and identifiers will prevent you from their misinterpretation.

As I said already, these arguments do not mean that complex identifiers are not needed. The problem is that using them may well result in an inconsistent behaviour due to ambiguous interpretation of properties. Using primitive identifiers is one possible safe way for solving this problem. It guarantees that you will not have serious problems with your system especially in a year or two when some other developers start extending it.

--
http://conceptoriented.com
Received on Tue Aug 29 2006 - 11:01:25 CEST

Original text of this message