Re: UNION ALL and SELECT DISTINCT

From: dawn <dawnwolthuis_at_gmail.com>
Date: 5 Apr 2006 15:11:26 -0700
Message-ID: <1144275086.093775.27610_at_e56g2000cwe.googlegroups.com>


David Cressey wrote:
> "dawn" <dawnwolthuis_at_gmail.com> wrote in message
> news:1144250458.682759.94210_at_e56g2000cwe.googlegroups.com...
> > David Cressey wrote:
> > > The nice folks who gave us the SQL language adopted two different
> > > conventions regarding duplicates in result tables.
> > >
> > > In the SELECT operator, they chose to have SELECT mean the same as
> SELECT
> > > ALL. If you want SELECT DISTINCT, you have to say so.
> > >
> > > Notice that UNION is just the reverse. If you want UNION ALL, you have
> to
> > > say so. If you just say UNION, you get only distinct rows in the
> result.
> > >
> > > I think it would have been nicer if they had adopted the same convention
> in
> > > both cases.
> >
> > I might not be thinking clearly, but I think the choices they made make
> > some sense to me. I want the selection and projection to be separate.
> > If I add one attribute to a view, for example, I would not want the
> > number of rows to change just because now what used to be a duplicate
> > row is unique. Or am I missing the point?
>
> You misunderstood what I was asking for. I was not asking to eliminate the
> difference between SELECT ALL and SELECT DISTINCT. All I was asking for is
> that SELECT should default to SELECT DISTINCT. When making views you could
> still specify either one.

I was concerned about the default select possibly changing the number of rows if the person writing the statement simply adds a single attribute. Given that SQL is primarily used by professionals, it could obviously be learned either way and I do understand why you would want the default to be "distinct" since that is almost always what is desired. I also understand why the default is as it is, however.

The language I'm accustomed to does not end up with this same problem since you ask every question of an entity (typically a strong entity) and you necessarily only get one instance of each such entity in your result.

> > SQL is a relatively dead language anyway
>
> Ha!

By george, I might have gotten you to laugh ;-) I was thinking that if I say that often enough...

> > I would guess that most would agree that they would expect accurate and
> > optimize on speed.
>
> You might be surprised. There are MANY times when I've suggested a SELECT
> DISTINCT to a neophyte, only to hear him wail ("but that TAKES TOO LONG".)
> They are missing the point.

Ah, yes, I've heard that too. It is one of the painful aspects of trying to get MV folks to learn SQL since one could say that MV queries are optimized with one result per entity. If asked what they would want in an optimizer, I would hope professionals would not suggest that incorrect results were acceptable, however.

> If SELECT DISTINCT gives correct results, but SELECT ALL gives incoreect
> results, it's time to stop and think about the nature of the logical query.

Not to mention the underlying data model, which goes hand in hand with a query language.

> The correct way to optimize is to first come up with a class of queries
> that give the right answer, and then find one that runs fast.

Agreed.

> There a large any numberof neophytes who try to do just the opposite: they
> try to discover all the queries that run fast. Then, among those, they try
> to pick one that gives the right results! This sounds absurd, but I speak
> from direct experience.

I've seen that happen too, but only with relational databases or SQL queries. That wasn't an issue with datatrieve, was it? Cheers! --dawn Received on Thu Apr 06 2006 - 00:11:26 CEST

Original text of this message