Re: Hashing for DISTINCT or GROUP BY in SQL
Date: Sat, 16 Oct 2010 08:56:34 -0700 (PDT)
Message-ID: <4ed63777-8d51-4ecf-af4f-c525909c2ac2_at_l8g2000yql.googlegroups.com>
On 16 oct, 02:32, paul c <anonym..._at_not-for-mail.invalid> wrote:
> On 15/10/2010 4:38 PM, Cimode wrote:
>
>
>
>
>
> > Hi paul
>
> > Thanks for elaborating about the historic context. I believe you have
> > misunderstood the point I am trying to make. This is not about
> > physical vs logical confusion or a database theory vs database
> > implementation confusion. It is about defining the math the physical
> > must satisfy to effectively support the logical.
>
> > I can conceive that VSAM is not significant for somebody observing
> > database science from a purely logical perspective of relation
> > operation and structural definition. But logical database science is
> > not all database science. The history of theory behind physical
> > implementations following the development of database science is just
> > as significant. I suspect that it is the inability of logical
> > theorists to conceive mathematical models that could allow
> > implementations of higher abstraction logical principles on binary
> > current mechanized addressing schemes that explains a part the failure
> > of the database science as a whole in contemporary times.
>
> > For instance, logical theorists I sometime find naive when they assume
> > no math could exists behind the algorithms of database implementations
> > attempts. And I find them even more naive, when they believe that the
> > lack of such math can not have an impact on the bias of how the
> > logical database science is conceivable. As for me, I can not
> > dissociate one from the other: though the logical(RM) dictates the
> > intent, the physical dictates the method to respect the intent.
>
> > I consider VSAM significant in physical database implementation
> > perspective and the lower level theory behind it, as compared to
> > previous systems. In previous systems, seek algorithms were mostly
> > relying on run time sorting and linear dichotomic searches. VSAM
> > introduced a sophisticated seeking scheme based the concept of
> > dichotomic leaf search aka*register zig zags* which is widely
> > implemented on direct image systems. The algorithms probably inspired
> > by fractal theory allowed an order of magnitude reduction of number of
> > logical IO's necessary to reach a specific pre-determined value. In a
> > sense, the clusters were not*just* a dumb stack of files, but were
> > also the ancestors of today's ordered indexes (known as clustered
> > indexes) frequently used on SQL systems.
>
> > The logic behind the data structure also relies on the concept of pre
> > order presenting similarities with latest transrelational model. I
> > believe somehow that this constitutes an evolution that can't be
> > neglected since it allowed to conceive as possible the implementation
> > of number of operators such as NOT EXISTS that were previously
> > considered resource prohibitive under ISAM systems.
>
> Cimode, to the extent I understand it, no argument. Like most people I
> saw Vsam only from the outside. It did have a couple of good
> theoretical underpinnings, one being b+ trees.
> IBM would have had some
> thinkers aware of that idea since the originator (forget his name) was
> at Boeing, a big customer of theirs. I'm usually pretty hard on IBM
> even though for much of my life it was indirectly responsible for most
> of my income. To be fair, I'm sure lots of the s/w products that came
> out of it started out as pretty pure theory that got 'adjusted' to be
> downward-compatible with the stuff customers had already been sold.
Exactly. Sure, the theory behind database science physical computing
has not always been what it could have been, but so was RM. I can not
count the number of same-object redefinitions made on RM from its
original inception, not to mention the multitude of algebras and
matching notations defined to characterize relational operations.
Relational theorists went onto focusing too much attention onto
limited controversial issues such as representation of missing
information but totally failed to extend the math behind the model.
Given the extent of the discovery and after so many years, Relational
model remains vastly an empirical model.
If relational logical theorists had put as much effort into *defining* an relational compatible mathematical model for optimizing relational operations and structure *physical* representations as they did into continuously defining new logical operators of higher abstraction, we'd probably not be in the situation we are today.
In other terms, I *also* consider today's failure of database science of the inability of the relational theorist to extend the math behind the RM above the realm of algebra. In such perspective, they remind me of Greek scientists who based vastly their theory on empirism so they did not feel the need to develop theory for experimentation. Later on, the Arabs (Persians) who did define theory for verification and experimentation, could confirm earlier theory, but they *also* negated some wrong theories. Algebra is one good output of such approach.
Regards... Received on Sat Oct 16 2010 - 17:56:34 CEST