Re: Hashing for DISTINCT or GROUP BY in SQL

From: paul c <anonymous_at_not-for-mail.invalid>
Date: Mon, 18 Oct 2010 15:27:36 +0000 (UTC)
Message-ID: <i9hp18$ovm$1_at_tioat.net>


On 18/10/2010 4:35 AM, Erwin wrote:
> On 18 okt, 03:03, paul c<anonym..._at_not-for-mail.invalid> wrote:
>> For example, I've often wondered 'where or what is the implementation
>> theory of constraints?'.
>
> The 1990 Ceri/Widom paper and the more recent Martinenghi phd thesis
> are, as far as I'm aware, the reference.
>
> You also know already that SIRA_PRISE allows declaration of both
> database and assignment constraints of arbitrary complexity. So I
> take it you are aware that there is an implementation for what you
> mention. Now, being the author of that implementation, I would never
> claim that I developed a 'theory' to achieve that. In fact, I wonder
> what the hell a 'theory of implementation' would look like. I'm
> tempted to suggest it would look like a 'theory of brick masoning', or
> a 'theory of mixing flour and water for bread baking'. Sounds odd, I
> must say.

Bakers and masons have used recipes for thousands of years, to produce the same wall or brick every day regardless of temperature, facilities or other factors. This seem analogous to what Cimode calls soundness. In modern times, theories behind those recipes have been pinpointed and robots now emulate the non-literal equations and environmental factors the bakers and masons learned to know. (In Switzerland, masons earn four or five times what programmers make, needless to say there aren't many of them.)

Maybe I distracted the thread with my perhaps peculiar impression of constraints. Eg., isn't a query just a constraint? Really, what else is there besides (asserted and implicit) relations, operators and constraints, no one of which has any usefulness without the other two?

When a constraint applies to a relation (or maybe I should say when a relation satisfies a constraint), how and when can we apply the constraint solely to a restriction or projection of the relation and know that the constraint is still satisfied by the relation? There are lots of extant adhoc re-writings used by various 'optimizers', some of which are have more formal underpinnings than others, such as the re-writing of a disjunctive query into a conjunctive form. Traditionally, both human and machine 'optimizers' have used metrics which I would classify as fairly crude, akin to what engineers call 'first approximations', the most common one being the counting of physical I/O's. A typical 'recipe' first looks for a few 'ingredients' such as keys followed by physical indexes and some will compare alternate indexes before 'deciding'. Not many will eliminate redundancy in a query or constraint. Optimization progress seems to involve whittling away at special cases. I suspect the re-writing usually consists of re-writing queries, not the relations being queried. Just musing out loud. Received on Mon Oct 18 2010 - 17:27:36 CEST

Original text of this message