Re: In an RDBMS, what does "Data" mean?
Date: Fri, 18 Jun 2004 19:05:00 +0100
Message-ID: <L4l1l0HM7y0AFwZd_at_thewolery.demon.co.uk>
In message <NSpyc.2728$pK3.2253_at_newssvr15.news.prodigy.com>, Eric Kaun
<ekaun_at_yahoo.com> writes
>"Anthony W. Youngman" <wol_at_thewolery.demon.co.uk> wrote in message
>news:rDtNrmL7f6xAFw2a_at_thewolery.demon.co.uk...
>> In message <qJJxc.385$Pt.272_at_newssvr19.news.prodigy.com>, Eric Kaun
>> <ekaun_at_yahoo.com> writes
>> >"Anthony W. Youngman" <wol_at_thewolery.demon.co.uk> wrote in message
>> >news:9EJ4q$GofOxAFw1i_at_thewolery.demon.co.uk...
>>
>> In theory, no. In practice, you might choose to split the invoice data
>> across two FILES, where you've promoted sub-attributes of INVOICE to be
>> primary attributes of the secondary file.
>
>Why would you do the split, when the intent seems to be to keep things
>whole? Is this purely for performance optimization? I have a hard time
>keeping up with shifts between logical and physical, and the reasons for the
>splits. I understand you CAN do these things, but why and when? What are
>your heuristics?
The heuristics are probably when it gets too complicated for the brain to comprehend easily.
I can understand you want things nice and clear cut, but the real world isn't like that. I use relational theory to help me understand the data down to one or two levels deeper than I need, then I draw the line at whatever level seems appropriate.
I am sorry I can't give you a better answer than that. But the real
world is messy. Deal with it!
>
>> >> >4. sub-sub-attributes
>> >>
>> >> Simply nest sub-attributes one level deeper. :-)
>> >
>> >Ah, hierarchical induction. I'll just have one File in my app. :-)
>> >
>> Nah! FILE = noun :-)
>
>Okay; I name my file "MyApplication." :-)
>
>> You completely missed the point here. Where and why would you use a
>> cascading delete? THINK! Be *practical*. What *works* in *reality*
>> (rather than theory, which can think up a thousand impossible scenarios
>> before breakfast (with apologies to "Alice in Wonderland")).
>
>Seldom, due to business desires, but to answer the question you're getting
>at: when there's a foreign-key dependency, and there's one relation that is
>deemed "important" enough to trigger the cascade. There could be multiple,
>though that's rare...
So I would I would seriously consider pulling all the tables into which
the cascade went into a single FILE. "Rules are for the guidance of wise
men, and obedience of fools" - I would use my intelligence as to whether
this made sense.
>
>> >> And then you use common sense to say "I use these fields all the time,
>> >> and these fields only rarely" so you split A into two physical FILEs,
>> >> and make all the colums of A-rarely into virtual columns of
>> >> A-all-the-time, and vice versa. So for retrievals the user notices
>> >> nothing (apart from the speed-up), although it does cost a bit extra
>> >> logic when updating.
>> >
>> >A logical cost is a big cost. Every updating app needs to know that,
>right?
>>
>> Yep ... but relational theory, which imposes mandatory separation of the
>> logical from the physical, imposes that cost on EVERY app, not just
>> those that update the data.
>
>And it enables EVERY app with EVERY optimization. (not close, but you get my
>drift)
But if the small cost of the "bad" code (which by definition is rarely
used) makes a big difference to the cost of the "good" code, then I'm
laughing all the way to the bank. Who cares if I add an hour to a job
that runs once a month, if by doing so I can shave a second off a job
that 50 users use several hundred times a day?
>
>You've just successfully argued against code sharing, by the way, since if
>something is coded badly (either slowly, or laden with defects), then every
>app has to suffer, so you're better off recoding it in each app, right?
>
>> Furthermore, by actively hindering the programmer from providing hints
>> to the database, relational forces the programmer to rely on the
>> database's artificial intelligence, which is quite likely to guess wrong
>
>And you've also just argued against compilers, since they're so likely to
>guess wrong about the intention of your code, and therefore will produce
>badly-optimized machine code.
>
>> Yup. Let's assume that the Pick database has been designed properly, and
>> that within the FILEs the data has been normalised. I can now present my
>> apps with a *closed* relational view!
>
>What do you mean by that?
I mean it's like a relational view, but if I've got a "one to many"
relationship, the "one" data only appears once, not replicated for every
instance of the "many".
>
>> My Pick application has also FORCED, by DEFAULT, my database to store
>> related data close to itself (what relational calls clustering, I
>> believe). It's fairly easy to prove, statistically, that this will
>> optimise data retrieval from disk.
>
>For that one access path.
Here you go again - crippling the race horse so we can have a "fair"
race against the crippled old nag ...
>
>> Sod AI optimisation, Pick doesn't
>> have a choice and it works, which is why in any system lacking
>> sufficient ram a Pick app will kick the equivalent relational app's
>> butt!
>
>I can think of several faster alternatives. Using ROWID and stashing
>hierarchies in Oracle tables would at least close some of the gap.
>Performance isn't the only point, but oh well...
>
>> Basically, by not hiding the physical implementation from the user, Pick
>> makes it easy to prove there just IS NO room for improvement.
>
>hahahahaha
>
>Oh - you were serious. My bad.
Yes I was :-)
>
>> By hiding
>> the physical from the user, relational forces you to rely on the AI and
>> you have no way of knowing whether it is efficient or not.
>
>AI? Yes, I'd hate to rely on something like a "computer" or some other
>fancy "automaton" that does "logic" or some such liberal nonsense... :-)
>
I don't. I rely on statistics to tell me the Pick model does a better
job than AI.
You said "for that one access path". But that access path IS THE MOST COMMON PATH! So. I can prove that it's the most common path. I can prove it's the most efficient path.
Can you prove, that by crippling the most common path, you can improve the "worst path" cases enough to make it worth-while? Was it Knuth that said "premature optimisation is the worst evil"? I couldn't give a damn if the nag trails in last by a racecourse. I want the thoroughbred to win.
Cheers,
Wol
-- Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999Received on Fri Jun 18 2004 - 20:05:00 CEST