Re: integer id columns for all tables

From: JOG <jog_at_cs.nott.ac.uk>
Date: 30 Aug 2006 09:08:12 -0700
Message-ID: <1156954092.180194.95450_at_p79g2000cwp.googlegroups.com>


Alexandr Savinov wrote:
> garhone schrieb:
> > Hi,
> >
> > Someone with greater expertise than I recently told me that it is best
> > to have an integer primary-key for all tables in a database, even if
> > the table already has some non-integer primary key, or some sort of
> > composite primary key. If there is already a non-integer key, then
> > create a new sequence column and set that as the primary key.
> >
> > Does anyone have an opinion on this and why?
>
> Here are a number of principles for designing identifiers:
>
> - An entity may well exist without properties but not without identifiers.

gibberish surely.

>
> - An identifier is the first thing to be created before actually the
> entity itself is created. In other words, we create an identifier and
> only after that (and if it is really needed) the corresponding entity is
> created.

We create identifiers do we? Not observe them? Unghh.

>
> - Identifiers have to be constant (in contrast to properties). If you
> use a kind of system identifier then you are guaranteed then nobody can
> change it or otherwise misuse.

No, we identify items via their properties. It seems a common problem in IT for people to not think about /how we recognise distinctions in the real world/ (this is where our propositions come from after all)

>
> - Identifiers should not be meaningful because they are intended to be
> manipulated by the software rather than by users (with the exception of
> some well known identifiers used for bootstrapping). So identifiers
> should be hidden from the users.

Sigh. Does this count as trolling now?

>
> - Identifiers should be defined before any properties exist. Such a
> structure will have only identifiers for accessing empty entities. After
> that we can add meaningful properties. This discipline can ensure the
> separation between these two concerns.

Double Sigh.

>
> - If you need uniqueness then use the corresponding constraint.
> Frequently the need for uniqueness in a property leads to the conclusion
> that it should be used as an identifier.

Not if one realises that uniqueness is not the only quality a good identifier has. See bob's informative list in his post.

>
> Assume that there are two tables Companies and Products with
> many-to-many relationship implemented via CP table. When we create this
> join table, it automatically gets some identification means for its
> elements. Or this format for its identifiers has to be provided
> manually, for example, as some auto-integers (there are not properties
> yet). *After that* we can add some properties. In this example we want
> to add two fields the first pointing to a product and the second
> pointing to a company. (Notice again that we keep identifiers and
> properties separately.)
>
> Alternatively, we might want to *optimize* this representation and
> combine two roles: identification and characterization. In this case we
> remove our identifier column and then say that the two properties
> <Company, Product> will identify each element. (Notice that the question
> is not if we need to *add* an integer identifier but rather if we can
> *remove* the existing identifier and use our properties instead of it.)
>
> We hope that such records will take less space. It is not always so. The
> thing is that these records can be referenced by other records and then
> we need to store two fields instead of only one. Thus records themselves
> will really take less space but other records will need more space for
> longer identifiers.
>
> Now assume that somebody wants to change one of the two properties. Here
> we need to answer the main question: are these columns identifiers or
> characteristics of the entity. Since we combined them, the answer is
> twofold. So every new developer of the system will give his own answer
> with the corresponding consequences.

more entity/proposition confusion.

>
> Another problem is that it not an exception when two records with the
> same pair of <Company, Product> may exist. Indeed, why not? These are
> only properties and they can take any values. Or assume that in future
> this join table will also need to reference more records as its
> characteristics. Should I add them to the existing two-field identifiers
> or should I keep them separately. If I add them, then I will have to
> reorganize the whole model. If not, then for some unknown reasons two of
> three fields will be used for identification while the third will be
> not. It does not make sense because all the fields have equal role. Here
> again keeping separately properties and identifiers will prevent you
> from their misinterpretation.
>
> As I said already, these arguments do not mean that complex identifiers
> are not needed. The problem is that using them may well result in an
> inconsistent behaviour due to ambiguous interpretation of properties.
> Using primitive identifiers is one possible safe way for solving this
> problem. It guarantees that you will not have serious problems with your
> system especially in a year or two when some other developers start
> extending it.
>
> --
> http://conceptoriented.com

<weeps/> Received on Wed Aug 30 2006 - 18:08:12 CEST

Original text of this message