Re: does a table always need a PK?

From: Lauri Pietarinen <lauri.pietarinen_at_atbusiness.com>
Date: Tue, 02 Sep 2003 00:21:56 +0300
Message-ID: <bj0da9$ue8$1_at_nyytiset.pp.htv.fi>


Paul G. Brown wrote:

>>Of course, BS12 did not have a very long life so I suppose there is not
>>so much data on how
>>it faired in the "real world".
>>
>>
>
> Another system sacrificed on the alter of 'standards'. (Which is a whole
> other rant.)
>
Regarding standards, you might be interested in this: http://www.braithwaite-lee.com/opinions/p75-hoare.pdf

>>Can you help me? Why do so many posters (some of who are
>>db-researchers, some of who are implementers)
>>claim that the bag model poses no problems? Or is this question
>>undecidable?
>>
>>
>
> Well, my perception is that the situation is as follows:
>
> Q: Boris, what's a set?
>
> A: Well Gretchen, a set is a group without duplicate elements.
>
[snip]

Thanks, Paul! Nice summary. However, you did not answer the question why some (many?) db-scientists and implementors say that bags pose NO problems
what so ever and even if we came up with an efficient implementation standing to

the test of the real world we would not be any happier.

> In a nutshell: sets vs. bags is an open question in my mind. Until someone
> builds a system with a set algebra and subjects it to the unhappiness of
> the real world I just feel that the question of whether the potential
> performance difference outweighs the aesthetic advantages is undecided. The
> only evidence I have is the early systems work and the way people complain
> about the performance differences between UNION/UNION ALL and the
> use of DISTINCT.
>
Well, you could always claim that most of the time users include at least one candidate key in the
select list, so no duplicate elimination would be needed if the system knew those keys.

And, if the user issues, say

select city
  from customers -- table with 10M rows

sure he'll star getting answers immediately, but what on earth will he do with the result???

In the question of unions it is often hard for the system to understand that no duplicate elimination
is necessary, something that the human user knows, e.g.

select name
  from usa_companies
union
select name
  from canada_companies

(supposing that names of all companies are unique).

Of course this could be circumvented by adding a dummy column just to help the optimiser, e.g.

select 'usa',name
  from usa_companies
union

select 'canada',name
  from canada_companies

>>>But they have negative implications.
>>> And it isn't reasonable to ignore their dark side.
>>>
>>>
>>>
>>It would be interesting to see a current assesment on the negative
>>implications.
>>
>>
>
> Take a standard query workload. Add a DISTINCT to all of
> the queries. Measure a) the difference in performance/throughput, and
> b) the difference in the results. I'm gonna bet that the DISTINCT is
> a lot slower and that there is little difference between the results.
> This is not a completely fair comparison, though, as there are queries
> where the difference between bag and set over an intermediate result
> is going to yield different answers and cases where a smart system would
> just shrug at the DISTINCT because it knows the results will be distinct
> anyway.
>
>
I'm not talking about current SQL-implementations which have started to deal with DISTINCT only fairly
recently. I''m just wondering if all of those hard problems are now solvable, after 20 years of research?
COULD a system be built that overcomes all the negative implications if we really wanted to

(not taking in accout commercial and/or political issues)?

BTW I can certainly understand that IBM or Oracle is not interested in such an undertaking.
But I thought the universities where there to invent new stuff, something that does not have
to be released the next quarter.

regards,
Lauri Pietarinen Received on Mon Sep 01 2003 - 23:21:56 CEST

Original text of this message