Re: Hashing for DISTINCT or GROUP BY in SQL
Date: Mon, 18 Oct 2010 02:31:53 -0700 (PDT)
Message-ID: <6f1d43ac-6adf-42c5-b55a-18fd9c5a6db1_at_v20g2000yqb.googlegroups.com>
On 18 oct, 03:03, paul c <anonym..._at_not-for-mail.invalid> wrote:
> On 16/10/2010 8:56 AM, Cimode wrote:
>
> > If relational logical theorists had put as much effort into*defining*
> > an relational compatible mathematical model for optimizing relational
> > operations and structure*physical* representations as they did into
> > continuously defining new logical operators of higher abstraction,
> > we'd probably not be in the situation we are today.
>
> My own background probably gives me a bias that wouldn't be typical but
> for me this has to do with what I think of as optimization. But that's
> probably different from the optimization that most other IT people think
> of. My attitude is probably more general than theirs, encompassing
> everything to do with what they might call implementation.
Have you ever considered that *optimization* is a too broad concept to
designate a significant area of fundamental research ?
What is the object of optimization in a relational perspective ? What does it aim at, physically wise ? What are the modalities to defined to achieve such aim ? Don't you have the impression that the definition of *optimization* in such broad perspective is the relational theorists's way sweep the dust under the rug ?
> For example, I've often wondered 'where or what is the implementation
> theory of constraints?'. Is there 'theory' to tell an implementator how
> to recognize something as elementary as a way to decide that a certain
> constraint applies only to a single tuple (or if you like a relation of
> cardinality one) as opposed to one that depends on more? A very common
> example of this would be the presence of a so-called 'insert' of a
> single tuple with a candidate key. When presented with an 'input'
> relation, and 'output' relation and a 'tuple to be inserted', what
> 'theory' is available to the author of an implementation that tells him
> the constraint applies either to the 'output' or to the 'combination' of
> the 'input' and the 'insert tuple'?
The fact of immediately considering the logical relational operation
as a premice for defining an implementation model is in itself an
example of the bias of relational logical theorists. How can we ask
such premature question, as a basis for extending a relational model
to define an implementation model, when we have not formulated
reasonnable axioms to effectively represent relations on a linear
adressing scheme ?
For instance, how should we encode relations in the first place ? Why should we encode in one way and not another ? How do we encode propositions on a linear adressing scheme ? individually ? Should we encode them as individual propositions in the first place ? How do we actually encode the validation of such propositions (truth assignment) ?
Codd opened a *huge* area of research that goes far beyond interrelation definitions and operations. The entire point of RM algebra is also to free our mind into how we conceive mechanized system's encoding schemes.
Example1: We may choose to an encoding scheme for un-ary relation R1 as physically persisting values A, B and C (Encoding scheme E1) but what prevent us from encoding R1 as [ALL ALPHABET] - [ALL BUT A, B, C] (Encoding scheme E2). What would be the consequences of choosing E2 over E1 as to how much logical IO's happen when we operate R1 ?
Example 2: Considering that each attribute A1 in relation R1 is in fact the output of an operation between the relation R1 and the domain D1 from which A1 draw all possible values, why can't we use such algebra to actually encode the information A1 by simply encoding the operation D1 INTERSECT R1 instead of directly encoding A1 tuple values.
Finally, major achievement of Codd was to open up to the possibility that we could operate relations without having order as a prerequisite. Did we exploit such possibility into encoding/operating relations ? Why is order still a major encoding obligation in most indexing schemes ? Is it conceivable that such order is a limitation of relational implementation given the fact that value sets are logically unordered and order is not a prerequisite for operating relations ?
> Of course all old-timers know 'how' to implement this, they test for the
> 'presence' of the key and if it is not 'present', they proceed with the
> 'insert'. But where is the 'theory'? I'm guessing that this is part of
> your point.
Exactly.
As years pass by and database science goes further and further into collective IT industry amnesia, I somehow believe *old timers* (note that I do consider myself an old timer) have simply failed to explore the opportunities openned by Codd, and remained, implementation wise, too bound to the IBM historical context of how relational implementation is to be conceived. Such lack of perpective limited RM expressive power and constrained it(unhappy pun!) to empirism rather than applying it to how we conceive information encoding as a whole. Received on Mon Oct 18 2010 - 11:31:53 CEST
