Re: Teach SELECT DISTINCT first!

From: Paul <paul_at_test.com>
Date: Tue, 27 Apr 2004 18:26:05 +0100
Message-ID: <DGwjc.35459$h44.5223395_at_stones.force9.net>


Laconic2 wrote:
> I claim, without proof, that a really good optimizer can tell, by using
> rules of logic, and perhaps more metadata than is now stored, when "SELECT
> ALL" and "SELECT DISTINCT" will produce the same result. If so, it can
> skip a step, and thereby speed things up.

Suppose I have a relation R that contains just the integers from 0 to 1000. I also have a complicated function f.

Then consider "SELECT f(i) FROM R".

or "SELECT COUNT(*) FROM (SELECT f(i) FROM R)"

There's no way in general an optimizer can tell whether or not there are duplicates without doing the calculation. Even if f were a relatively simple function it would be too much to expect that a relational engine should have knowledge of maths built-in.

Suppose I'm the writer of the query and I know for a fact that f is a "montone increasing" function i.e. f(a) and f(b) will always be different if a != b.

Then the only way the query can be optimised is if I can hint to the DBMS that this is the case. If I store the function's definition in the DBMS it could be done here, but what if I write the function out in full in the query instead?

In a truly relational query language where we have only SELECT, not SELECT DISTINCT or SELECT ALL, maybe we would have two sections to a query: the logical section and the physical section. The physical section would be for optimization hints that the DBMS couldn't reasonably be expected to know.

Paul. Received on Tue Apr 27 2004 - 19:26:05 CEST

Original text of this message