Re: In an RDBMS, what does "Data" mean?

From: Anthony W. Youngman <wol_at_thewolery.demon.co.uk>
Date: Tue, 15 Jun 2004 01:02:07 +0100
Message-ID: <vaucdOJ$xjzAFw3r_at_thewolery.demon.co.uk>


In message <Kq3yc.2602$1v6.1860_at_newssvr15.news.prodigy.com>, Eric Kaun <ekaun_at_yahoo.com> writes
>> Basically, if we assume (reasonable assumption) that everything else is
>> irrelevant when compared to disk access, I can prove that (almost) every
>> attempted disk access actually retrieves data that is relevant to the
>> question.
>
>Oh... easy == fast.
>
>Perhaps the above is true, but that requires your data be structured in the
>same file, which is both boon and bane. In a TRDBMS (remember, this is
>c.d.t), the DBMS would reorganize base relations's storage based on access
>patterns, whereas in Pick you have to decide that in advance, and do a lot
>of work later if it changes. Unless I'm misinterpreting... in any event,
>access optimization and clustering based on common usage (which can change,
>especially as reports and ad hoc queries enter the fray) should be dynamic,
>and analyzed by a computer.

And dynamic re-organisation can be prohibitively expensive as it reorganises the data to optimise your year-end reports, only for you to have run your final report five minutes ago. This is the fallacy of making all reports equally "easy" by imposing unnecessary overhead on the common ones!

If you want to know one thing about an invoice, chances are you want to know several. MV will (if properly designed) return EVERYTHING in a single disk hit. If you then want to know about the company it was sent to, a further SINGLE disk hit will return EVERYTHING you want there.

You're trying to optimise everything. If you access one bit of an invoice, the chances of you accessing another bit of the same invoice are HIGH. The chances of the computer guessing correctly whether you want another invoice, or the company, or any other bit of information unrelated to that invoice, are piss-poor. So why try?

MV optimises retrieval of information about any single real-world object. It will step, with blinding speed, down a list of keys. It doesn't even try to second-guess the user's next random data access - what's the point? Was it Knuth said "premature optimisation is the root of all evil"? The design of MV naturally clusters related data. And ignores unrelated data. And it shows! As I've said, again and again, why does experience say that MV beats relational for speed hands down every time, especially for "large" databases?

Cheers,
Wol

-- 
Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999
Received on Tue Jun 15 2004 - 02:02:07 CEST

Original text of this message