Re: In an RDBMS, what does "Data" mean?

From: Anthony W. Youngman <wol_at_thewolery.demon.co.uk>
Date: Fri, 18 Jun 2004 19:05:00 +0100
Message-ID: <L4l1l0HM7y0AFwZd_at_thewolery.demon.co.uk>


In message <NSpyc.2728$pK3.2253_at_newssvr15.news.prodigy.com>, Eric Kaun <ekaun_at_yahoo.com> writes
>"Anthony W. Youngman" <wol_at_thewolery.demon.co.uk> wrote in message
>news:rDtNrmL7f6xAFw2a_at_thewolery.demon.co.uk...
>> In message <qJJxc.385$Pt.272_at_newssvr19.news.prodigy.com>, Eric Kaun
>> <ekaun_at_yahoo.com> writes
>> >"Anthony W. Youngman" <wol_at_thewolery.demon.co.uk> wrote in message
>> >news:9EJ4q$GofOxAFw1i_at_thewolery.demon.co.uk...
>>
>> In theory, no. In practice, you might choose to split the invoice data
>> across two FILES, where you've promoted sub-attributes of INVOICE to be
>> primary attributes of the secondary file.
>
>Why would you do the split, when the intent seems to be to keep things
>whole? Is this purely for performance optimization? I have a hard time
>keeping up with shifts between logical and physical, and the reasons for the
>splits. I understand you CAN do these things, but why and when? What are
>your heuristics?

The heuristics are probably when it gets too complicated for the brain to comprehend easily.

Okay, I'm getting physical, and messy, but that's the real world. Where do you draw the line between biology and organic chemistry? Between organic and inorganic chemistry? Between chemistry and the physics of atoms?

Okay, there is a pretty clear line between the physics of atoms and atomic physics, but that's an anomaly!

I can understand you want things nice and clear cut, but the real world isn't like that. I use relational theory to help me understand the data down to one or two levels deeper than I need, then I draw the line at whatever level seems appropriate.

Don't forget, I'm a chemist by training. If I'm doing bio-chemistry it's incredibly useful to understand electron orbital theory but I can't WORK at that level. It's just *too* abstract to be meaningful.

By abstracting data down to (and focussing on) the tuple, relational theory has just gone into TOO MUCH detail and lost sight of (indeed, to some extent DESTROYED) any view of the big picture...

What you want to do is present the user with a view of the data at their level, and then analyse it deeper.

As a chemist, I think in molecules. As a businessman, I think people tend to think in terms of customers, invoices, things like that. THAT is the level at which the database should interface with users.

Relational interfaces at the chemical equivalent of atoms - with the tuple. The poor programmer has to think UP to the "business object" level, and then UP AGAIN to the reality equivalent.

With Pick, I can stand at the "business object" interface, and reach DOWN into the data, and UP into reality. It's far easier to stand on the interface reaching in both directions, than to be mired down in the detail, struggling to get out.

I am sorry I can't give you a better answer than that. But the real world is messy. Deal with it!
>
>> >> >4. sub-sub-attributes
>> >>
>> >> Simply nest sub-attributes one level deeper. :-)
>> >
>> >Ah, hierarchical induction. I'll just have one File in my app. :-)
>> >
>> Nah! FILE = noun :-)
>

>Okay; I name my file "MyApplication." :-)
>
>> You completely missed the point here. Where and why would you use a
>> cascading delete? THINK! Be *practical*. What *works* in *reality*
>> (rather than theory, which can think up a thousand impossible scenarios
>> before breakfast (with apologies to "Alice in Wonderland")).
>
>Seldom, due to business desires, but to answer the question you're getting
>at: when there's a foreign-key dependency, and there's one relation that is
>deemed "important" enough to trigger the cascade. There could be multiple,
>though that's rare...

So I would I would seriously consider pulling all the tables into which the cascade went into a single FILE. "Rules are for the guidance of wise men, and obedience of fools" - I would use my intelligence as to whether this made sense.
>
>> >> And then you use common sense to say "I use these fields all the time,
>> >> and these fields only rarely" so you split A into two physical FILEs,
>> >> and make all the colums of A-rarely into virtual columns of
>> >> A-all-the-time, and vice versa. So for retrievals the user notices
>> >> nothing (apart from the speed-up), although it does cost a bit extra
>> >> logic when updating.
>> >
>> >A logical cost is a big cost. Every updating app needs to know that,
>right?
>>
>> Yep ... but relational theory, which imposes mandatory separation of the
>> logical from the physical, imposes that cost on EVERY app, not just
>> those that update the data.
>
>And it enables EVERY app with EVERY optimization. (not close, but you get my
>drift)

But if the small cost of the "bad" code (which by definition is rarely used) makes a big difference to the cost of the "good" code, then I'm laughing all the way to the bank. Who cares if I add an hour to a job that runs once a month, if by doing so I can shave a second off a job that 50 users use several hundred times a day?
>
>You've just successfully argued against code sharing, by the way, since if
>something is coded badly (either slowly, or laden with defects), then every
>app has to suffer, so you're better off recoding it in each app, right?
>
>> Furthermore, by actively hindering the programmer from providing hints
>> to the database, relational forces the programmer to rely on the
>> database's artificial intelligence, which is quite likely to guess wrong
>
>And you've also just argued against compilers, since they're so likely to
>guess wrong about the intention of your code, and therefore will produce
>badly-optimized machine code.
>
>> Yup. Let's assume that the Pick database has been designed properly, and
>> that within the FILEs the data has been normalised. I can now present my
>> apps with a *closed* relational view!
>
>What do you mean by that?

I mean it's like a relational view, but if I've got a "one to many" relationship, the "one" data only appears once, not replicated for every instance of the "many".
>
>> My Pick application has also FORCED, by DEFAULT, my database to store
>> related data close to itself (what relational calls clustering, I
>> believe). It's fairly easy to prove, statistically, that this will
>> optimise data retrieval from disk.
>
>For that one access path.

Here you go again - crippling the race horse so we can have a "fair" race against the crippled old nag ...
>
>> Sod AI optimisation, Pick doesn't
>> have a choice and it works, which is why in any system lacking
>> sufficient ram a Pick app will kick the equivalent relational app's
>> butt!
>
>I can think of several faster alternatives. Using ROWID and stashing
>hierarchies in Oracle tables would at least close some of the gap.
>Performance isn't the only point, but oh well...
>
>> Basically, by not hiding the physical implementation from the user, Pick
>> makes it easy to prove there just IS NO room for improvement.
>
>hahahahaha
>
>Oh - you were serious. My bad.

Yes I was :-)
>
>> By hiding
>> the physical from the user, relational forces you to rely on the AI and
>> you have no way of knowing whether it is efficient or not.
>
>AI? Yes, I'd hate to rely on something like a "computer" or some other
>fancy "automaton" that does "logic" or some such liberal nonsense... :-)
>

I don't. I rely on statistics to tell me the Pick model does a better job than AI.

That crack about the race horse was deliberate. Relational seeks to make all access paths equal. Fair enough. Rather like the UK educational system that sees competition as "unfair" and wants all schoolkids to leave Uni with a first class degree, not caring whether they are a dunce or a genius (sadly, I'm serious about our education :-(

You said "for that one access path". But that access path IS THE MOST COMMON PATH! So. I can prove that it's the most common path. I can prove it's the most efficient path.

Can you prove, that by crippling the most common path, you can improve the "worst path" cases enough to make it worth-while? Was it Knuth that said "premature optimisation is the worst evil"? I couldn't give a damn if the nag trails in last by a racecourse. I want the thoroughbred to win.

Cheers,
Wol

-- 
Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999
Received on Fri Jun 18 2004 - 20:05:00 CEST

Original text of this message