Re: Hashing for DISTINCT or GROUP BY in SQL

From: Anne & Lynn Wheeler <lynn_at_garlic.com>
Date: Fri, 15 Oct 2010 15:00:40 -0400
Message-ID: <m3aamfuvqf.fsf_at_garlic.com>


paul c <anonymous_at_not-for-mail.invalid> writes:
> The early 'databases' used by the first wide-reaching online network
> systems such as PARS involved ruthless reduction of 'attributes' as it
> were. Yet it was still possible for the real-time flight and cargo
> databases' of large airlines to be stored in some hundreds of MB's, in
> other words they could be recorded in the main memory of pretty much
> any of today's consumer pc's. Had such memory been available thirty
> or forty years ago, I'd venture that the programming landscape would
> look different today. There remains much bowing and scraping towards
> legacy obstacles. From papers they wrote, it looks like even the
> System R people's thinking was dominated more by past physical history
> than the likely future. Me too, it wasn't until the 1980's when I
> could actually put a computer under my arm that I started to realize
> how much more important the logical side of programming is. I think
> younger people are hide-bound in a different way, there are now so
> many different programming languages and therefore idioms which
> encourage them to think that all that can be invented has already been
> invented.

in the late 70s, there was contention between the 60s "physical" IMS dbms group and the system/r group. The IMS group contending that the implicit index of system/r doubled the physical space needed on disk and there could be 4-5 times increase in disk i/o (processing index). the system/r group countered that the "implicit" index of system/r significantly reduced the human and adminstrative overhead involved in managing a large IMS database (direct record pointers exposed as part of the data a programmer had to handle ... and be updated if the DBMS was re-organized).

starting in the 80s ... there were dramatic increases in the amount of system storage ... allowed significant caching of indexes (mitigating a lot of additional RDBMS I/O overhead) as well as dramatic increase in amount of disk space and reduction in price/megabyte ... minimizing the additional index overhead ... at the same time people expertise was becoming scarce and more expensive (becoming market inhibitor for IMS).

You still find significant IMS use at large (especially financial) institutions
http://en.wikipedia.org/wiki/Information_Management_System

consulting with IMS development group was one of the things Jim palmed off on me when he left for Tandem.

I actually had a project in the mid-90s to look at the ten impossible things related flt/route finding ... i.e. getting from point A to point B (for one of the large airline res systems). the implementation had ondisk database (from late 50s, early 60s).

in the mid-90s, the full OAG raw data for all scheduled commercial flts in the world was a little over 200mbytes. the reservation systems turned that into large gbytes of DBMS with huge index (again design from 60s). I had recently come off a project doing optimal layout for large chip & board design ... and so condensed the raw OAG master file into approx 30mbytes with lots of organization ... that all fit into memory ... and than ran real-time walk thru the data (rather than dbms lookup ... with various optimization ran close to 100 times faster than dbms implementation).

It now easily fits in most present day smartphones.

-- 
virtualization experience starting Jan1968, online at home since Mar1970
Received on Fri Oct 15 2010 - 21:00:40 CEST

Original text of this message