Re: Teach SELECT DISTINCT first!

From: Paul <>
Date: Tue, 27 Apr 2004 22:15:17 +0100
Message-ID: <j7Ajc.36911$>

Mikito Harakiri wrote:

>>There's no way in general an optimizer can tell whether or not there are
>>duplicates without doing the calculation. Even if f were a relatively
>>simple function it would be too much to expect that a relational engine
>>should have knowledge of maths built-in.


> CAS usually assumed to have this kind of knowledge, so why (in principle)
> relational engine couldn't have it? The technical difference might be that
> everything in CAS is an expression, while in SQL we allow function
> programmed in 3GL language. In foreseeable future no relational optimizer is
> expected to look inside procedural code, of course. (Note an interesting
> analogy: it is easy to reason about a function defined decalratively as an
> expression, and its very hard to reason about fuction defined procedurally.)

CAS=Computer Algebra System?

OK here's a pathological example:
Define a function f on the integers by:
f(i) = i (if Fermat's Last Theorem is true

  • 0 (if Fermat's Last Theorem is false

I can't remember offhand whether the proof has been verified officially yet but let's assume it has for the sake of argument.

Now if we know this we can see at a glance that the function values don't coincide for any two different arguments.

But can we expect a CAS or a DBMS to know this? Could a CAS prove Fermat's Last Theorem even in theory? Don't some proofs require you to step into the meta-language - they can't be proved in the language itself? (think Godel).

In practice though I don't think it's going to be practical for a DBMS to have a full computer algebra system on board. It would probably be quicker for the DBMS to do things the simple way anyway.

Now functions are actually special cases of (mathematical) relations. So maybe a relational system could think of them in this way and access them as logical (DBMS) relations. Whether they are physically implemented as an actual table or not would be up to the optimizer. For example say you define a complicated function on the integers. It might be best for the DBMS to store a table that pre-calculates this.

For non-integer domains you might have more problems though. But then you've already got problems with rounding etc. where two decimals may be different theoretically but the same according to the DBMS.

Paul. Received on Tue Apr 27 2004 - 23:15:17 CEST

Original text of this message