Re: I am confused as to whether I should use numeric IDs or not

From: Joseph Godfrey <jgodfrey_at_ecologic.net>
Date: 2000/02/03
Message-ID: <389A54CD.415CF4B5_at_ecologic.net>


Harry Chomsky wrote:

> Joseph Godfrey wrote in message <3898E678.A2F6F6B6_at_ecologic.net>...
> >On a practical level I understand and appreciate the need to use of numeric
 ids
> >to guarantee uniqueness. I've done it. On theoretical level, however, I
 think
> >there is a problem. The way ids are being used here is essentially
 equivalent to
> >an object-id, or "oid." I believe C. Date and others have expressed concern
> >about how such identifiers are used in actual object and object-relational
> >database systems.
>
> As I understand Date, his objection to the use of OIDs in object-oriented
> database systems is precisely that they are NOT attributes, not part of the
> visible data, not subject to the usual rules involving dependencies and
> joins, but rather a special new concept that makes the underlying database
> model more complicated.
>
> In this case, though, we're talking about adding an arbitrary unique id that
> IS an attribute, so I don't think Date's objections apply. Here the
> arbitrary id becomes part of the data, just like name and address are part
> of the data. Users can be aware of the id -- for instance, an invoice sent
> to a customer might have the customer's id and order ids printed on it. No
> special OID is needed, and no principles of Codd's mathematical system are
> violated.
>

I don't mean to be argumentative. However, this is almost exactly where I think the issue gets tricky and I'm eager to understand it better. So I'd like to push here a little more.

Just to be clear, I of course recognize the real-world dimensions of the problem and the practical value of arbitrary unique ids. This is often the only way to handle ingesting "dirty" data. To even be more radical, even a person's name is an arbitrary attribute. Whether one creates an id using the alphabet or numbers really doesn't matter here. But I still think there's a problem...

I think the point about an OID not being an explicit attribute is one way to approach the question. But in object-relational databases OIDs are explicit. Furthermore, autonumbering (as in MS Access, Informix, Oracle, etc) is not the same thing as an id that has semantics, e.g., a social security number, order id, customer number, etc. Very often the ID is hidden from view, as in the question that initiated this thread. An autonumber in this case is not an attribute of the data but a bookkeeping deviceof the database system very much akin, I think, to an OID. Furthermore this way of using ids can circumvent referential integrity constraints. If one includes the abstract ID as part of the primary key and keeps the entire key explicit, I understand that this conforms with relational theory and referential integrity can be enforced.. But autonumbering is often not used this way -- and that I think is trouble.

To take a slightly stronger view, I don't see how one can use an id divorced of semantics without that id being equivalent to an OID, whether explicit or not. Of course the id has the semantics of bookkeeping in the given system -- but what does that have to do with modeling the information outside the system. The fact that the id is explicit in the row is no different than an object method that returns the OID. Being able to see and manipulate the abstract ID isn't the point.

I understand that Date has no problem with abstract ids (so long as information is not embedded in them). I've heard him say this. I've also heard his objection to OIDs, to which I'm sympathetic in principle, but value having in practice! What I'm struggling with here is whether OIDs and abstract ids without attached semantics aren't the same thing. And what then? My sense is that you can get into trouble, though the cost of the trouble is often worth it from a pragmatic perspective.

I hope I'm not just splitting hairs here. Its not my intent. I've been on the job and seen folks struggle with what to do when faced with this question.

>
> >The principle at stake, I believe, is that an object (instance
> >of a class; member of a set) should be uniquely identified by some
 combination
> >of its attributes. The introduction of an arbitrary, abstract id should
 only be
> >a convenience. I would worry about a data model and/or business rules that
 could
> >distinguish between two objexts only by virtue of some arbitrary, unique
> id.
>
> I don't think there's much choice in this case, since humans in the real
> world don't have uniquely identifying characteristics that are convenient to
> use as keys. (Digitized fingerprints, perhaps? :-)) The real-world data is
> not suitable for entry into a mathematically-defined model, so we have to
> massage it to make it suitable -- in this case by adding an attribute.
>

Yes, I agree. It's often the only practical way to get the job done. It's not unlike the convenience of weak typing in c to manipulate data, as opposed to the strong typing used in object languages that requires explicit casts. Weak typing may not be "right" but it can be useful.

>
> >What would prevent the same individual from being recorded multiple times
 with
> >inconsistent relationships established with other objects?
>
> Well, if the same customer calls in again and the person who takes the order
> fails to find the customer's existing record, they may well enter a new
> record for the customer and give them a new id number. Now the same
> individual (from a real-world point of view) is represented as two different
> individuals (from the database's point of view). Such is life. That's a
> "messy" real-world problem, and I don't think the relational model attempts
> to offer a way around it.
Received on Thu Feb 03 2000 - 00:00:00 CET

Original text of this message