Re: Hashing for DISTINCT or GROUP BY in SQL
Date: Fri, 15 Oct 2010 16:38:33 -0700 (PDT)
Message-ID: <fa9f6c6a-faf9-4cd1-8d3c-5eb685db4ed1_at_30g2000yqm.googlegroups.com>
On 15 oct, 23:36, paul c <anonym..._at_not-for-mail.invalid> wrote:
> On 15/10/2010 1:26 PM, Cimode wrote:
>
> > Your comments reminded me of what initially triggered my interest in
> > databases in the first place: the proto-index structure logic behind
> > the early VSAM systems. At that time, the initial direct image data
> > files were called clusters. Not much have changed since, regarding
> > database implementations.
>
> It does seem that physical notions still impede people some or all of
> the time. When the only access methods I knew were Qsam, Bdam and Isam,
> a boss sent me to a course on the then-new Vsam. When I returned to the
> office a week later he asked me about it and I told him it would never last.
>
> Part of the motivation for IBM's vsam had to do with the mess that their
> disk architecture had created. What started as their big OS/360
> (lovingly referred to be some users as 'Obstacle System 360') turned
> into MFT or somesuch and then MVS. It had a 'system catalog' which had
> nothing to do with any database principles whatsover, being just a bunch
> of adhoc conventions. Vsam became the vehicle for a replacement
> catalog. I believe the clusters were really intended as a way to
> aggregate several disks to give a larger 'virtual' disk. But the
> problem remained that people still thought in terms of that disk
> 'device', no matter that is was 'virtual'.
>
> The 'disk architecture' that advented with the introduction of System
> 360 was called 'count-key-data' aka CKD. It was interesting in that one
> could program at the peripheral level, outboard of the mainframe cpu,
> but that focus on individual devices was an insidious impediment to real
> system thinking (which I think is what is needed for real database
> thinking) in the sense that a coherent theory is needed.
>
> The craziest extreme of CKD might have been the 2321 data cell drive.
> The point to the 'K' or 'key' part of the acronym was that 'keys' were a
> distinct portion of a disk's surface and you could write what was called
> a 'channel program' which would apply only to one disk and not tie up
> the channel through which other disks were connected. I don't remember
> tape drives having 'key' support but even though the 2321 used tape, it
> supported CKD and looked to the programmer as a disk drive.
>
> Wang had a superior kind of 'vsam', forget the name, which had
> transactions. I learned it in about two hours, it was that well documented.
>
> In the 1970's IBM brought out a system called the S/38. It was really
> radical, having a linear addressing scheme that merged memory and
> whatever devices were attached, so the fixation on individual devices
> was ignored. Apparently internal politics at IBM limited the size of
> the S/38 so that it wouldn't compete with the big mainframes. Customer
> operations staff used to need to go to IBM courses to learn what
> operating system options needed to be toggled to enable the addition of
> peripherals. Some other companies like Burroughs had machines even
> before then that required no software changes to attach a new disk
> drive, in some cases not even down time so Burroughs didn't make any
> money charging for courses to learn how to do that. Wasn't Burroughs
> stupid!
>
> In recent years there was quite a lot of 'buzz' in the Linux world about
> something called the Reiser FS. It had aspects that confused logical
> with physical, which worried me. Since the principle author has gone to
> jail it seems that development has stalled and maybe I don't need to
> worry anymore about that.
Hi paul
Thanks for elaborating about the historic context. I believe you have
misunderstood the point I am trying to make. This is not about
physical vs logical confusion or a database theory vs database
implementation confusion. It is about defining the math the physical
must satisfy to effectively support the logical.
I can conceive that VSAM is not significant for somebody observing
database science from a purely logical perspective of relation
operation and structural definition. But logical database science is
not all database science. The history of theory behind physical
implementations following the development of database science is just
as significant. I suspect that it is the inability of logical
theorists to conceive mathematical models that could allow
implementations of higher abstraction logical principles on binary
current mechanized addressing schemes that explains a part the failure
of the database science as a whole in contemporary times.
For instance, logical theorists I sometime find naive when they assume
no math could exists behind the algorithms of database implementations
attempts. And I find them even more naive, when they believe that the
lack of such math can not have an impact on the bias of how the
logical database science is conceivable. As for me, I can not
dissociate one from the other: though the logical(RM) dictates the
intent, the physical dictates the method to respect the intent.
I consider VSAM significant in physical database implementation
perspective and the lower level theory behind it, as compared to
previous systems. In previous systems, seek algorithms were mostly
relying on run time sorting and linear dichotomic searches. VSAM
introduced a sophisticated seeking scheme based the concept of
dichotomic leaf search aka *register zig zags* which is widely
implemented on direct image systems. The algorithms probably inspired
by fractal theory allowed an order of magnitude reduction of number of
logical IO's necessary to reach a specific pre-determined value. In a
sense, the clusters were not *just* a dumb stack of files, but were
also the ancestors of today's ordered indexes (known as clustered
indexes) frequently used on SQL systems.
The logic behind the data structure also relies on the concept of pre
order presenting similarities with latest transrelational model. I
believe somehow that this constitutes an evolution that can't be
neglected since it allowed to conceive as possible the implementation
of number of operators such as NOT EXISTS that were previously
considered resource prohibitive under ISAM systems.
Received on Sat Oct 16 2010 - 01:38:33 CEST