Re: Hashing for DISTINCT or GROUP BY in SQL

From: paul c <anonymous_at_not-for-mail.invalid>
Date: Fri, 15 Oct 2010 21:36:05 +0000 (UTC)
Message-ID: <i9ahg4$lld$1_at_tioat.net>


On 15/10/2010 1:26 PM, Cimode wrote:
> Your comments reminded me of what initially triggered my interest in
> databases in the first place: the proto-index structure logic behind
> the early VSAM systems. At that time, the initial direct image data
> files were called clusters. Not much have changed since, regarding
> database implementations.

It does seem that physical notions still impede people some or all of the time. When the only access methods I knew were Qsam, Bdam and Isam, a boss sent me to a course on the then-new Vsam. When I returned to the office a week later he asked me about it and I told him it would never last.

Part of the motivation for IBM's vsam had to do with the mess that their disk architecture had created. What started as their big OS/360 (lovingly referred to be some users as 'Obstacle System 360') turned into MFT or somesuch and then MVS. It had a 'system catalog' which had nothing to do with any database principles whatsover, being just a bunch of adhoc conventions. Vsam became the vehicle for a replacement catalog. I believe the clusters were really intended as a way to aggregate several disks to give a larger 'virtual' disk. But the problem remained that people still thought in terms of that disk 'device', no matter that is was 'virtual'.

The 'disk architecture' that advented with the introduction of System 360 was called 'count-key-data' aka CKD. It was interesting in that one could program at the peripheral level, outboard of the mainframe cpu, but that focus on individual devices was an insidious impediment to real system thinking (which I think is what is needed for real database thinking) in the sense that a coherent theory is needed.

The craziest extreme of CKD might have been the 2321 data cell drive. The point to the 'K' or 'key' part of the acronym was that 'keys' were a distinct portion of a disk's surface and you could write what was called a 'channel program' which would apply only to one disk and not tie up the channel through which other disks were connected. I don't remember tape drives having 'key' support but even though the 2321 used tape, it supported CKD and looked to the programmer as a disk drive.

Wang had a superior kind of 'vsam', forget the name, which had transactions. I learned it in about two hours, it was that well documented.

In the 1970's IBM brought out a system called the S/38. It was really radical, having a linear addressing scheme that merged memory and whatever devices were attached, so the fixation on individual devices was ignored. Apparently internal politics at IBM limited the size of the S/38 so that it wouldn't compete with the big mainframes. Customer operations staff used to need to go to IBM courses to learn what operating system options needed to be toggled to enable the addition of peripherals. Some other companies like Burroughs had machines even before then that required no software changes to attach a new disk drive, in some cases not even down time so Burroughs didn't make any money charging for courses to learn how to do that. Wasn't Burroughs stupid!

In recent years there was quite a lot of 'buzz' in the Linux world about something called the Reiser FS. It had aspects that confused logical with physical, which worried me. Since the principle author has gone to jail it seems that development has stalled and maybe I don't need to worry anymore about that. Received on Fri Oct 15 2010 - 23:36:05 CEST

Original text of this message