Re: Is mysql a RDBMS ?

From: Bob Badour <bbadour_at_golden.net>
Date: Mon, 25 Aug 2003 13:39:51 -0400
Message-ID: <qZs2b.659$IM7.79448100_at_mantis.golden.net>


"Pablo Sanchez" <honeypot_at_blueoakdb.com> wrote in message news:Xns93E27B3C0A94Epingottpingottbah_at_192.168.1.1...
> "Heikki Tuuri" <Heikki.Tuuri_at_innodb.com> wrote in
> news:52h2b.18$G37.11_at_read3.inet.fi:
>
> >> DISTINCT often improves performance by allowing the dbms to
> >> ignore
> > potential
> >> duplicates that very often cannot even exist.
> >
> > Can you show statistics about that?
>
> DISTINCT's often require a sort area to squish out dup's, which means
> a performance degradation.

Pablo, you seem to use "often" as a synonym for "seldom", and I think that will confuse people.

Duplicate removal is seldom required.

First, relations have no duplicates to begin with

Second, very few operations have intermediate duplicates. Basically, the issue is limited to project and union. It also applies to summarize, which is a variant of project, but SQL requires duplicate removal in this case anyway. (ie. GROUP BY) Sybase and its variants screw that one up, of course.

Third, only a proper subset of the uses of project and union require duplicate removal. And I strongly suggest only a minority of uses in practice require duplicate removal. Many (very likely most) uses of project include at least one candidate key. Many (very likely most) uses of union combine disjoint sets of tuples.

Finally, you assume that the dbms evaluates the intermediate duplicates in the first place and that no subsequent step would benefit from the sort order. In many cases, the dbms will use an existing index or existing order to skip the evaluation of intermediate duplicates, or the dbms will choose a sort order suitable both for duplicate removal and for improving the performance of a later step. Received on Mon Aug 25 2003 - 19:39:51 CEST

Original text of this message