Re: In an RDBMS, what does "Data" mean?

From: Anthony W. Youngman <wol_at_thewolery.demon.co.uk>
Date: Thu, 10 Jun 2004 01:14:51 +0100
Message-ID: <rDtNrmL7f6xAFw2a_at_thewolery.demon.co.uk>


In message <qJJxc.385$Pt.272_at_newssvr19.news.prodigy.com>, Eric Kaun <ekaun_at_yahoo.com> writes
>"Anthony W. Youngman" <wol_at_thewolery.demon.co.uk> wrote in message
>news:9EJ4q$GofOxAFw1i_at_thewolery.demon.co.uk...
>> >1. files
>>
>> This represents a "physical" object. A house. A car. A company. A
>> building. An invoice.
>
>What about relationships between any of those things - e.g. cars and houses
>owned by companies, and the invoices for their purchases? At what point does
>something move from being a file to an attribute or vice versa?

Just because I own a car, doesn't make the car part of me ... think language, and think nouns and adjectives (and gerunds).

In Britain, a car's registration plate is assigned on first sale, and "deleted" when the car is crushed. Actually, that's not completely true, but near enough.

So my car's registration plate is an attribute of me, and of my car. So in the "car" FILE it would be the primary key, and in the "person" FILE it would be a foreign key (to use relational terminology). You do not put two different "nouns" in the same FILE - you use a foreign key.
>
>> >2. attributes
>>
>> This describes the object. A house has an address. A car has a colour,
>> and an owner. A company may have several buildings (so here we have a
>> "foreign key"). A building has an address. An invoice may have several
>> addresses, and several lines.
>
>So if an invoice has a Parts attribute, it then needs additional attributes
>corresponding to each "attribute" of its use of those parts? At what point
>does the relationship between the two acquire enough "meaning" or enough
>attributes of its own to warrant being in its own file? Imagine line items
>on an invoice became very complex, with shipment information and payment
>information... is there a point at which you say "Enough!" and stop adding
>those things as sub-attributes to the Parts attribute of the Invoice?
>

In theory, no. In practice, you might choose to split the invoice data across two FILES, where you've promoted sub-attributes of INVOICE to be primary attributes of the secondary file.

>> >3. sub-attributes
>>
>> A building has an address - which may have multiple lines (actually,
>> this is a bad example, but it's a common mistake). An invoice has
>> multiple lines, each of which contains several different types of data.
>
>Yes, but how far do you go? Certainly at some point some of those attributes
>refer to other things properly categorized as Files. Do you find yourself
>yanking out attributes or sub-attributes, and moving the lot to Files? At
>least to me, normalization offers a much clearer view on how do make those
>data design decisions. Maybe I'm being alarmist, but when I've had to make
>changes in a SQL data model it's been due to actual changes in the
>requirements (external predicates), not just acquiring one attribute too
>many. Granted that I don't know Pick, so haven't walked a mile in your
>shoes... these criteria just seem very dicey.

Changes in Pick are almost invariably due to changes in requirements, too :-)
>
>> >4. sub-sub-attributes
>>
>> Simply nest sub-attributes one level deeper. :-)
>
>Ah, hierarchical induction. I'll just have one File in my app. :-)
>
Nah! FILE = noun :-)

Now if your app consists solely of invoices, then your approach might work :-)

>> Basically, to describe it in relational terms, if you link table A to
>> table B, such that deleting a record in A causes a cascading delete of
>> one or more records in B, then I'd make each column of B a column of A,
>> and each row of B into a sub-attribute row of A.
>
>You're describing parent-child relationships. Surely you run into
>multi-parent and many-to-many scenarios?

Think about what you've just said. I said "deleting a record in A triggers a cascading delete into B (and C (and D (...)))". Do you want to try that in relational? Deleting one record in relational will cascade and delete your entire database ... ?

You completely missed the point here. Where and why would you use a cascading delete? THINK! Be *practical*. What *works* in *reality* (rather than theory, which can think up a thousand impossible scenarios before breakfast (with apologies to "Alice in Wonderland")).
>
>> And then you use common sense to say "I use these fields all the time,
>> and these fields only rarely" so you split A into two physical FILEs,
>> and make all the colums of A-rarely into virtual columns of
>> A-all-the-time, and vice versa. So for retrievals the user notices
>> nothing (apart from the speed-up), although it does cost a bit extra
>> logic when updating.
>
>A logical cost is a big cost. Every updating app needs to know that, right?

Yep ... but relational theory, which imposes mandatory separation of the logical from the physical, imposes that cost on EVERY app, not just those that update the data.

Furthermore, by actively hindering the programmer from providing hints to the database, relational forces the programmer to rely on the database's artificial intelligence, which is quite likely to guess wrong ...
>
>So you're definitely describing some physical redesign which sits below the
>logical view available to users. I think in relational terms, that's what
>the DBMS vendors should offer, since they can more accurately (and easily)
>split relations based on usage, and have that happen dynamically. But you do
>seem to be describing a user- or application-level view of the data, which
>is layered atop something that is, or leans toward, or could be "more
>relational." At least that's the way it seems...
>
Yup. Let's assume that the Pick database has been designed properly, and that within the FILEs the data has been normalised. I can now present my apps with a *closed* relational view!

My Pick application has also FORCED, by DEFAULT, my database to store related data close to itself (what relational calls clustering, I believe). It's fairly easy to prove, statistically, that this will optimise data retrieval from disk. Sod AI optimisation, Pick doesn't have a choice and it works, which is why in any system lacking sufficient ram a Pick app will kick the equivalent relational app's butt!

Basically, by not hiding the physical implementation from the user, Pick makes it easy to prove there just IS NO room for improvement. By hiding the physical from the user, relational forces you to rely on the AI and you have no way of knowing whether it is efficient or not.

Cheers,
Wol

-- 
Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999
Received on Thu Jun 10 2004 - 02:14:51 CEST

Original text of this message