Re: In an RDBMS, what does "Data" mean?

From: Anthony W. Youngman <wol_at_thewolery.demon.co.uk>
Date: Thu, 10 Jun 2004 02:20:42 +0100
Message-ID: <0UZ6uTSqd7xAFwl0_at_thewolery.demon.co.uk>


In message <MbKxc.395$Pt.155_at_newssvr19.news.prodigy.com>, Eric Kaun <ekaun_at_yahoo.com> writes
>"Anthony W. Youngman" <wol_at_thewolery.demon.co.uk> wrote in message
>news:bXlBuYMZQPxAFwBs_at_thewolery.demon.co.uk...
>> When I analyse our MV system at work, I think in terms of physical
>> objects. I then decompose each physical object into normal form. I DON'T
>> NEED to think about other objects while I'm decomposing the one in front
>> of me.
>
>Physical objects - how quaint. :-)

No comment :-)
>
>In my experience, even entities like invoices and orders become unmanageable
>as "physical objects" - specifically when someone placing an order needs to
>know some fairly complex interrelationships between the parts in that order,
>the warehouses and their parts, and parts previously ordered by the same
>customer. And in a paint formula database, the "physical objects" are both
>far from clear, and far different than even the users suppose they are.

Except that if you think of it as "noun or adjective", it does actually become a lot clearer ... if the same adjective describes several nouns then you need multiple instances of it :-)
>
>> We're in the progress of porting to MS SQL-Server. The data diagram is
>> an ABSOLUTE NIGHTMARE! When I look at the table diagram it's an absolute
>> spaghetti of links EVERYWHERE! I don't have a clue which tables model
>> which physical object, the meaning of links isn't intuitive.
>
>Several points:
>1. Diagrams can be messy, and they can be nicely-organized
>2. If you were to draw a diagram that captures the functions you present to
>users in the dictionary, the files you use, and the files split for
>efficiency, you might see something similarly ugly
>3. The meaning of links should be fairly straightforward if you're thinking
>in terms of predicates. Then again, if the database designer didn't think
>that way, you could be in trouble. Surely you've seen bad Pick data models?

Have I seen bad models? I work with them :-(
>
>> Trying to juggle hundreds of tables is far harder than keeping track of
>> several tens of physical objects to which I can relate, even if each of
>> those objects is then broken down logically into normal form.
>
>Agreed - good modeling is difficult, and it's certainly very useful to think
>of things in groups. ER/win always did a good job for me - I could keep a
>100-table logical data model segregated into domains [sic]. I never looked
>at the entire thing all at once.

Pick never expects me to :-)
>
>> The MV database layout imposes a grouping which helps me grasp the
>> system complexity.
>
>That's a very good thing. What's not a good thing is selecting a grouping
>that makes sense to Function X to push its way into the definition of the
>data, since Function Y might have a very different idea what groupings
>should be imposed.

Here again, the noun/adjective paradigm just seems to work a treat ...
>
>> While I can easily view the MV structure as equal to
>> the relational structure, a true relational database does not give me
>> the MV structure which appears much less complicated by virtue of
>> appealing to the way I naturally view the world.
>
>Again, if you only ever need to "view the objects" in one way, I envy you.
>Experience hasn't been that kind to me.
>
>> Take the invoice. From the MV point of view, I see it as a SINGLE
>> object. It is *TRIVIAL* for the database itself to decompose that and
>> present it, via ODBC, to a relational programmer who wouldn't even
>> realise that the MV back-end viewed it as a single object.
>
>I think you're backwards on this:
>
>1. Seeing it as a single objects implies some degree of encapsulation
>
>2. Many queries need to cross encapsulation boundaries - hence the need for
>the Pick/MV query mechanism to directly support list attributes and
>sub-attributes and sub-sub-attributes... if it were really one object, you
>could only "get at" those pieces via the Invoice's defined operations.
>Encapsulation is a red herring (direct quote from Date) - every "persistence
>mapping" tool violates it.
>
>3. It's not at all trivial. Say your Invoice has 4 attributes: customer,
>parts, ship dates, and payments. Part[N] must have a corresponding Date[N]
>which is the date on which that part was received; Payments is completely
>separate. How would the mapping "know" that that correspondence exists?
>Certainly there is "meaning" there? The aggregation has traded one general
>form of meaning for a very app-specific form; in examples other than the
>somewhat-hierarchical Invoice/Order, the value of the trade diminishes even
>further.

Very easily. Pick metadata actually provides a very simple mechanism for saying which fields are linked, and which are not.

For example, some of our Pick FILES are broken up into three or four tables for export to ODBC. It's not a problem at all.
>
>4. Given that you do various queries, and that the database answers lots of
>questions, what value is it for the database to "know" it's a single
>objects? You don't see the "lowest level" of a data model being something
>"egalitatian", which can support several views? It seems to me much more
>powerful and general to supply predicates as the foundation, and layer a
>"path expression" on top which can present any arbitrary view of the data.
>Example: atop a relational model, I can present the Invoice, as well as a
>Warehouse which "contains" the parts it shipped and which customers bought
>them? Reporting and GUI generation, here I come...
>
Statistics says that if you want to know one thing about an object, then the chances are high that you want to know several things about that object.

Okay, if the query is "please list all customers who bought part X", then Pick gains nothing over relational. But if the query is "please list all parts invoiced on date Y" then Pick gains big time - merely by asking "what invoices are dated Y" I get all the data I want for free as a side effect. If there are ten invoices, I need ten data accesses to get all the invoice details. Relational needs ten data accesses to get the invoice numbers from the date, then loads more accesses to get the actual items from the invoice numbers.

Relational needs an optimiser - Pick gets it for free ... and the stats say that on average it pays off handsomely :-)

Cheers,
Wol

-- 
Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999
Received on Thu Jun 10 2004 - 03:20:42 CEST

Original text of this message