Re: what are keys and surrogates?
Date: Fri, 4 Jan 2008 14:39:25 -0800 (PST)
On Jan 4, 9:37 pm, rp..._at_pcwin518.campus.tue.nl (rpost) wrote:
> JOG wrote:
> >> >1) A 'Key' is a set of attributes, used at the logical layer to
> >> >uniquely identify a proposition.
> >> >2) An 'Identifier' is a set of attributes, used at the conceptual
> >> >layer to uniquely identify an entity.
> >> Why use two different terms?
> >Because they apply to different things, propositions and entities.
> >Don't confuse the two.
> What is the difference?
> For me, an entity is a proposition treated as a thing
> (i.e. used as a reference rather than a description).
I'm starting to think this is a wind up ;) If so props indeed! The rest of what you have posted is just, well, bluster. You intimate I bullied someone (good grief), that I'm pedaling dogma, that predicate logic has some notion of hidden attributes (!?), that I hadn't considered the possibility of...surely not...automatic id generation, etc, etc. Gosh, its all so silly. And what you have written is just so abrasive, well, I have to doubt the intentions behind it.
I don't really understand why you're so entrenched - I've explained the logic behind how using hidden attributes as keys can be deleterious, and I've given you some examples. So until I can find some time to write them up more clearly, I am afraid "Mr. Moss, I have to let you go".
> >> The definition I was taught is different: a key is a constraint on
> >> a relation, namely a set of attributes such that any two different
> >> tuples in the relation differ in at least one of these attributes.
> >Makes sense, but whoever mentioned relations? My definitions were
> >generalized, not specific to E/R or RM.
> E/R and RM is as generalized as it gets, as far as I'm concerned.
> Unless you have some other mathematical formulation in mind.
> >> This is quite different from your notion,
> OK, I must have misunderstood what you wrote.
> >> >* A 'Surrogate Identifier' is an attribute designated at design time
> >> >to stand in place of natural identifiers that are either: (i)
> >> >Impractical to observe, (ii) Impractical to manipulate.
> >> My try: an "artificial" identifier is an relation attribute
> >> that is a key, and on which equality is the only meaningful operation.
> >> (I.e. it only serves to cross-identify the tuples of the relation with
> >> something else, e.g. tuples in another relation, or something outside
> >> the database).
> >> A "surrogate" identifier is a *hidden* artificial
> >> identifier, i.e. even the database queries only use them in equalities.
> >No attributes should ever be hidden. Ever. If they are, they can be
> >deleterious to the system. Myself and Brian had a /long/ debate about
> >this in the past, and one of the reasons I still consider consider his
> >points with diligence and an open mind, is that he showed a lot of
> >balls towards the end when he acceded the logic of that point -
> >although we heartily debate many others still ;)
> So I looked that up: you appear to be referring to
> What I'm seeing in that thread is a single person (Brian Seltzer)
> who simply states the fundamental cornerstone of not only relational
> databases, but predicate logic in general (namely, the fundamental
> "hiddenness" of attribute values, a principle that in many theoretical
> articles isn't just reserved for special "surrogate" attributes, but
> is simply assumed for all attributes), and keeps getting thrashed
> by the rest of you lot until he's so dizzy that "a light bulb comes on"
> and he "concedes". No one in that entire thread appears to understand
> what it means for a surrogate to be hidden: Brian is quite close in his
> descriptions, but doesn't find the switch, while you and Bob Badour are
> outright wrong about what it means for an attribute to be "hidden" or
> "a surrogate", because you overlook the possibility of "perfect auto-id"
> attributes that can only be created by automatic generation and can only
> be compared for equality. These are standard in relational database
> systems and were introduced into database theory by this article:
> You may have arguments against them (I have some of my own)
> and some of what is said in that thread may apply to them, but the thread
> doesn't even mention them. Brian's eventual argument for conceding
> doesn't apply to them.
> >I certainly can't face going through that battle all again I'm afraid,
> >safe to say that as a programmer I understand the intuition behind
> >what you think (after all I believe Codd made the same error in RM/T),
> >but know that it is a mistake and can end up in situations where the
> >database gets screwed up.
> I'm not sure I want to take up the battle axe, but your database is safe.
> >> (So unless we have specific value domains to safeguard this, this is
> >> a property of the queries we use rather than the attribute itself.)
> >> >* Once a 'Surrogate Identifier' has been created it merely becomes an
> >> >unfamiliar 'Natural Identifier' (it too must become an observable
> >> >attribute of the entity it applies to).
> >> This is not true: they can serve as foreign keys without ever being
> >> published (if your queries are careful enouygh). I'd like to use
> >> a separate term for arbitrary but published identifiers.
> >As I previously stated, no attributes should ever be hidden. A tuple
> >is merely an encoded proposition. That proposition orginates prior to
> >entry in the database, and as such all of its contents come from the
> >outside world.
> Yes, that is your dogma. Making that assumption doesn't make it
> a necessity. Making valid arguments for it might: I haven't seen any.
> >In another thread I used a car example to argue against
> >OID's, but one might have used a hidden surrogate instead of a VIN
> >attribute, and ended up with the same broken database.
> Yes, when you confuse hidden attributes with published, assignable ones,
> as you stubbornly insist on doing, things will break horribly.
> >> >* A 'Surrogate key' is merely a key that contains an attribute that
> >> >was a surrogate identifier at the conceptual layer.
> >> For me there is no such thing as a surrogate identifier at the
> >> conceptual layer. A surrogate is a "1NF" counterpart of an
> >> arrow in a conceptual model.
> >Surrogate means in "place of".
> Exactly: to be more exact, it stands for a reference to the whole tuple.
> >Your phone company quite sensibly
> >chooses a customerID for you "in place of" of your naturally
> >identifying dna code. Once the surrogate identifier is created it
> >becomes another statable natural attribute (and indeed the phone
> >company asks for it when you call up to complain about your bill).
> That is why it isn't hidden. Its not being hidden fundamentally breaks
> it as a surrogate: I can now pose as another customer by using his
> customerID, and customerIDs will start to accrue all kinds of additional
> meaning (e.g. if they are issued in increasing order, applications may
> start to rely on that being the case in one way or another).
> >It is unfamiliar compared to your other attributes, an no doubt like me
> >you have to check the top of your bills to find out what it is. And
> >this all has nothing to do with databases, it all happens outside.
> Hidden attributes, on the other hand, never leave the database;
> in queries they are only used in equality tests (joins).
> This is why it makes sense to distinguish hidden attributes
> from published "arbitrary" identifiers.
> I don't blame you for having overlooked this, it wasn't exactly
> obvious to me when I first read about it, but I hope you will
> in the future.
> >Surely that is all straightforward and entirely sensible!
> Yes, but it also completely misses the point.
> >Merry Xmas, J.
> Belated best wishes for 2008 ...
Received on Fri Jan 04 2008 - 23:39:25 CET