Re: What is a surrogate identifier

From: Walt <wamitty_at_verizon.net>
Date: Fri, 16 Mar 2007 19:37:42 GMT
Message-ID: <amCKh.3179$dG.1666_at_trndny08>


"Marshall" <marshall.spight_at_gmail.com> wrote in message news:1174060127.110635.173510_at_e65g2000hsc.googlegroups.com...
> On Mar 16, 6:38 am, "Cimode" <cim..._at_hotmail.com> wrote:
> > On Mar 16, 3:32 pm, "JOG" <j..._at_cs.nott.ac.uk> wrote:
> >
> > > I would also add in hindsight that the distinction between something
> > > that identifies a proposition and something that identifies the
> > > entities about which those propositions speak is an extremely
> > > important distinction to make. I think this often gets confused,
> > > generating confusion and miscommunication. Indeed I imagine this is
> > > the very reason why Codd chose the term 'key' instead of 'identifier'
> > > when it came to defining attributes which make a certain proposition
> > > unique.
> > >From Date's relational dictionnary
> >
> > surrogate key
> >
> > A single-attribute key with the property that its values serve
> > solely as surrogates-hence the name-for the entities they stand for
> > (in other words, they serve merely to represent the fact that the
> > corresponding entities exist, and they carry absolutely no additional
> > information or meaning).

>

> Nice.
>

> An example that I keep coming back to is Addresses. In more
> complex systems, there is a need for Addresses : Customers
> being many : one. (Billing address, shipping address, etc. etc.)
>

> So, just considering a very simple version of an address:
>

> addr1
> addr2
> city
> state/province
> postal code
>

> Is there *anything* like a key in there? Nope. The whole bundle
> together is the key. But if you did that, it means you're putting
> all five attributes in any other table that needs to refer to an
> address. This is theoretically sound but painful in practical terms.
> So people create a numeric id. Ideally the mapping is is a bijection,
> but that doesn't tend to happen; it's id -> address only. One time
> I once counted a customer who the same address in his associated
> address appearing 76 times! I mean character-for-character identical!
>

> Anyway, this seems like a mess to me. Anyone have any comments?
>
>

> Marshall

>

Here's a comment:

The "existence" of an address is something we rarely deal with. The actual address space is so large that a table of all "existing" addresses is too large for most of us to want to create it, as a practical matter.

Consider the following Address:

   <addr1>1234 Main St.
   <addr2>Suite 23
   <city>Springfield

    <state/province>IL
    <postal code>62704-9987

Now does such an address exist? Well, for most of us that create "contact data" for mailings, we really don't insert any such checks.

But if you go to the USPS website that looks up ZIP codes for you, they will tell you that
"1234 Main St., Springfield IL is an "undeliverable address". Either there's no Main St. in Springfield, IL, or there's no building with number 1234, or there's no mailbox in that building.

Presuming that the database behind the USPS website is accurate in this case, those people are really asserting the "non existence" of such an address. That's quite different from simply asserting that "such an address doesn't match any address in my database."

On other words, with regard to addresses, most of us are working with an open world assumption. Even if we are working on a closed world assumtion with regard to the rest of our data.

There's another item that's even simpler than that: e-mail address. It's practically impossible to assert any possible e-mail address is, in fact, non existent. The propagation delays for mail are longer than the mean time between creation of new e-mail addresses. Like it or not, e-mail address is an open world attribute in your database. Received on Fri Mar 16 2007 - 20:37:42 CET

Original text of this message