Re: candidate keys in abstract parent relations

From: Roy Hann <specially_at_processed.almost.meat>
Date: Thu, 19 Jan 2006 21:10:19 -0000
Message-ID: <SLCdnR-38eMwnU3enZ2dnUVZ8tydnZ2d_at_pipex.net>


"Tony Rogerson" <tonyrogerson_at_sqlserverfaq.com> wrote in message news:dqoi24$505$1$8302bc10_at_news.demon.co.uk...
> > What exactly is wrong with a long value? Who cares how long the value
is?
> > (Or more to the point, what does "long" really mean in this context?) If
> > we're talking performance, the time it takes to transmit a 1kb value is
an
> > inconsequential fraction of the time it takes to lookup a surrogate key
to
> > get the value.
>
> No No - you see thats just the point!
>
> If the value where 1KB and you had 100 items in the list box, I now have
to
> transmit 100KB to the client, on the internet thats a big deal; now
multiply
> that up to 10, 100, 1000 users and it becomes a really really big deal;
even
> saving a couple of KB's on a page really makes a difference in
scalability.

Since network transmission time dominates, I don't see how the smallness of the value that gets transmitted back when the user makes a selection really makes much difference. You must already have delivered the *all* the text of *all* the possible options to *all* the users so that they know what they are choosing. Whether they send back a small surrogate key or an entire large composite key is really irrelevant by then.

Your point is still a valid one; it just has nothing to do with databases. You are describing how a bad user interface is also a badly behaved network client. I sure as heck don't want to be presented with a 100+ item list box even on a LAN, never mind the Internet. I'd say the DBMS would be doing you a favour by making it hard to do stuff like that. I do fully understand what you are saying because I have seen just exactly what you describe. It was an Internet interface to a taxation system where accounting companies were presented with a list box to select one of their clients. Some had list boxes of over a thousand choices, weighing in at +500Kb each. But the bigger problem was that it was humanly unusable no matter how fast it was. (And it wasn't.)

> Now consider a search page with say 10 listboxes of average 100 items in
> each, using the surrogate key approach you can very easily cache these on
> the middle tier and refresh every 'x' minutes or seconds; you don't care
> that the natural key underneath changes say from 'Clarke' to 'Rogerson'
> because the surrogate key is a read only - never changing constant.

(Names are a bad example for us to use here because they are a poor key anyway, but let's ignore that.) Most of the time good keys don't change (unless there is a typo or some such error to be corrected). That is handled quite effectively if the DBMS supports foreign key constraints that allow ON UPDATE CASCADE. Admittedly that can be quite intensive when it is triggered, but it is triggered very rarely in practice (usually), plus you are spared the trouble of building any surrogate key generation machinery and the trouble caused by duplicates that went undetected because of a spuriously unique surrogate key value.

> >> or even worse a
> >> composite key in there - which begs the question, if your primary key
is
> >> a
> >> composite key which often it is, how would you code that?
> >
> > Don't understand the question.
>
> Say you had a composite primary key, surname, firstname, dob for instance
> just for example sake.
>
> The data would be Rogerson, Tony, 10/04/1970
>
> The list box value would need to be Value="Rogerson, Tony, 10/04/1970";
but
> what if that data contained comma's - what delimiter would you use to
> seperate the composite keys? You'd probably need to use fixed length.
>
> Whereas, plop a surrogate key on there say id=5 it would just be
Value="5".

Again, this is an interface/programming issue intruding (unnecessarily IMO) on a database design. It is quite possible to imagine an API that would handle the problem entirely transparently. For instance, since the composite key is unique one could compute a single hash key *transparently, in the API, at run-time* and use that instead. No one will ever provide such an API of course... :-(

> > Please don't imagine that I reject the use of surrogate keys, but I do
> > find
> > there are great benefits if you can avoid them. They should be employed
> > only with the greatest reluctance.
>
> My experience is the complete opposite, perhaps its just i've got used to
> cranking the most performance out of SQL Server in the early days because
> the hardware and disks weren't that powerful. I remember DB2 I didn't care
> because it was on a mainframe :)

Well my experience is with national taxation systems and batch applications. My goal is to have zero code that the runs outside the DBMS. I have observed over the years that the less stuff there is in the database that's not naturally occurring business data, the easier it is to achieve that goal and the faster those applications can be made to go (by large factors, e.g. 5, 10, more). When you are dealing with processes that run for days, facing deadlines set in law, that's just as important as a fast on-line interface.

My point (yes, I have one) is that different types of application will put different demands on the DBMS, but restrained use of surrogate keys is universally beneficial.

Roy Received on Thu Jan 19 2006 - 22:10:19 CET

Original text of this message