Re: does a table always need a PK?
Date: 1 Sep 2003 22:46:20 -0700
Message-ID: <57da7b56.0309012146.4760f01d_at_posting.google.com>
Lauri Pietarinen <lauri.pietarinen_at_atbusiness.com> wrote in message news:<bj0da9$ue8$1_at_nyytiset.pp.htv.fi>...
> Thanks, Paul! Nice summary. However, you did not answer the question
> why some (many?) db-scientists and implementors say that bags pose NO
> problems
> what so ever and even if we came up with an efficient implementation
> standing to
> the test of the real world we would not be any happier.
Well, when you find someone who is willing to say "there are no problems with bags", ask them. They might simply respond that there are other problems which are more pressing, or else that bags on balance pose no more DBMS implementation problems than sets (as distinct from database design and usability problems). They might also respond that none of their customers appears to care, and problems by definition are what customers care about.
[ snip ]
> Well, you could always claim that most of the time users include at
> least one candidate key in the
> select list, so no duplicate elimination would be needed if the system
> knew those keys.
Trouble is, you can't build systems for the most of the time. If you're going to implement something it has to cater to the nasty corner cases: the ones where it isn't clear where to add the duplicate discard operations but where it is clear you're going to need a lot of 'em. (Queries sans keys).
[ snip ]
>
> And, if the user issues, say
>
> select city
> from customers -- table with 10M rows
>
> sure he'll star getting answers immediately, but
> what on earth will he do with the result???
You have two choices in this case. You either sort the { city } bag, discarding duplicates as you go, and then return the result. Sorting means ordering, so the last value might be first. So the result first value won't be returned until the last has been read in. (Most DBMS engines use this approach for UNION, DISTINCT and GROUP BY.)
On the other hand you might build a hash table of results already returned and probe it once for each value (except the first). This means holding the entire set of results for the duration of the query execution. (Some DBMSs use this hashing approach because of its nice 'time to first row' properties.)
The first is slow. The second consumes lots of memory. Pick one.
[ snip ]
> Of course this could be circumvented by adding a dummy column
> just to help the optimiser, e.g.
>
> select 'usa',name
> from usa_companies
> union
> select 'canada',name
> from canada_companies
Or you could go with a bag algebra, get the answer you want, and require nothing of the user writing the query. (I'm just pointing it out, in fairness. . )
[ snip ]
> I'm not talking about current SQL-implementations which have started to
> deal with DISTINCT only fairly
> recently. I''m just wondering if all of those hard problems are now
> solvable, after 20 years of research?
> COULD a system be built that overcomes all the negative implications if
> we really wanted to
> (not taking in accout commercial and/or political issues)?
But I would encourage anyone who wants to have a go.
KR
Pb Received on Tue Sep 02 2003 - 07:46:20 CEST