Re: foundations of relational theory? - some references for the truly starving

From: Mike Preece <michael_at_preece.net>
Date: 22 Oct 2003 21:23:58 -0700
Message-ID: <1b0b566c.0310222023.73e4ba5d_at_posting.google.com>


> Costin Cozianu wrote:

>> Mike Preece wrote: 
>>> Costin Cozianu wrote: 
>>>> Mike Preece wrote: 
>>>>> Costin Cozianu <c_cozianu_at_hotmail.com> wrote in message news:<bn3nbs$seug1$1_at_ID-152540.news.uni-berlin.de>...
>>>>> [snip]
>>>>> 
>>>>>You could compensate if you wanted by a careful thought out
scientific
>>>>>work to prove exactly what your model buys the user.
>>
>> It *is* a damn shame no one has done this already - afaik. Any
authors out
>> there able to spot an opportunity?
>>
>>>>>Actually, the work 
>>>>>was kind of done for you, theorists  have analyzed the model and
reached
>>>>>the logical conclusion that there's not much to it.
>>>> 
>>>> References please? Don't bother pointing me to Date's
inconclusive
>>>> "research done on the web". I'm more interested in the complete >>>> analysis with the logical conclusions. I suspect (strongly) that it
>>>> doesn't exist. I concede that there are, however, a vast number of
>>>> theorists that *assume* incorrectly (and illogically) that there's not
>>>> much to it. I'd ask each and every database theorist reading this >>>> thread to ask themselves "what is the basis in fact for this belief?".
>>>>
>>> The basis in facts are very simple: Nested Relations offer no extra
>>> expressive power, and plenty of complications.
>>
>> Let's just pause a moment.
>>
>> You are talking about "Nested Relations" - with capitals no less.
You
>> could have said "Pick" or "MultiValued" - but you chose not to. I think we
>> have to be careful about what's meant by that exactly.
>>

>
> Because somebody from your side said that Pick/MV is pretty much the
> same as nested relation, on which there is a handful of scientific
> literature, whereas on Pick/MV there's no basic article detailing the model.
>

OK. I guess that's because we're talking about fundamentally different things and attempting to use the same terminology - so it's kinda sorta the same thing but fundamentally very different. I really must provide an example.  

>> Oracle has "Nested Tables"
>> (http://www.dba-oracle.com/art_9i_data_model.htm). They have *pointers* to

>> the location of the nested rows. This is *not* Pick/MV.
>>

>
> I think the reference is not the greatest. In any case nested table can
> be implemented "in situ" also (see Oracle 8i/9i complete reference,
> either in book or online).
>

Could you please provide a link for this?

> There was a conscious decision of the guy who wrote the article to store
> it otherwise:
>
> store as nested_prev_address return as locator;
>
> Which means, store in a separate table extent, put a table of pointers
> in the master table. If he hadn't specified this option the default
> would be to store them in situ.
>
> The guy mentions in the article that he did it for performance
> consideration.
>
> In any case in Oracle you INSERT/QUERY the records in nested table
> declaratively, i.e. they have the keyword named THE, which is roughly
> equivalent to the UNNEST operator in nested relations theory.
>
> The same applies to Pick or any MV database, when you increase the size
> of a record you decrease the performance of all basic operations on the
> table.

This is sometimes true - although not necessarily. The determining factor is whether or not the read or update involves a "frame fault". If no "frame fault" in involved then performance is the same whether the item being accessed is 15 bytes long or 1500. Performance is also equal regardless of the number of items on the file or total file size. It will take no longer to read one 1500 byte item from a 10 gig file containing 30m items than it would to read a 15 byte item from a 10k file containing 100 items.

It might be worth mentioning at this stage that the amount of data you can store in a Pick item will require far less physical disk space than on any other (uncompressed) database. There are two reasons for this:
1) because every field is of variable length - every datum is just as long as it needs to be and no more and empty fields occupy no space at all (other than single character system delimiters) - and 2) because there is often no need for pointers to related data (the related data is within the item itself).

> So if the master table is accessed/updated more frequently then
> accesses for the information in the two of them combined

What "master" table? Which two? In Pick/MV there need only be one item. One access/update. One head movement. READ INVOICE FROM INVOICES,'12345' need not require any other reads to get the "line item details" - they can be in the INVOICE item as multivalues (and possibly subvalues for finer detail within each "line item").

> there's not a
> good case to cluster the parent/children records (or whatever your
> nomenclature is) together.
>
>> In Pick the nested rows don't need pointers - they are in-situ. The item,
>> or record, contains all of the related data - hence we get it all in a
>> single read. No need to "dereference" anything. Not such a "plenty of

>> complications".
>>

>
> The "complications are with regards to integrity constraints,
> normalization theory and query optimization.
>

Data integrity is simplified, normalisation is greatly simplified and there's no need to optimise anything. I know. Sounds incredible doesn't it? I'll try to remember to include an example at the end of this post.  

>> Fans of Pick (like me) state time and again that Pick is simple. Don't
>> take our word for it. Look at the real world. Look at the IT departments,
>> their expenditure, the staff levels - and all the rest - and compare. It's
>> simplicity is why there are a great many businesses out there running on
>> Pick systems with far fewer resources than would be required to run >> comparable SQL-relational database systems. I doubt you'd believe how
>> "lean and mean" many Pick systems run. Might be interesting to find out.
>> There's another opportunity folks - this time for a survey as well as a
>> publication. I promise I'll shut up forever if I'm wrong about what you'll

>> find - if you look properly.
>>
>>> There is nothing to be had by adding nested relations to the
relational
>>> model.
>>
>> Not if it's implemented the way Oracle have - I completely agree.
It needs
>> a more fundamental change to the *SQL-* relational model. Wayhey! That's
>> kinda similar to the title of the thread! Important difference though
>> don't you think - that *SQL-* bit?
>>
>>> Adding nested relations to it it's like adding an operator Y to 
>>> classical arithmetic a Y b = a + b - a * b, it is totally
unnecessary.,
>>
>> If my assumption is corrrect - and you are talking about "nested tables"
>> in the Oracle sense - then I agree completely. As the article I provided a
>> link to states: "However, it sometimes takes longer to dereference the OID
>> to access the nested table entries as opposed to ordinary SQL tables join
>> operations". No great surprise there then. Pick - as I've already said -
>> doesn't need to dereference anything, the related data is in-situ.
>>
>>>>>Even more, the model 
>>>>>is already available one way or the other in existing SQL DBMSes,
>>>>> 
>>
>> Don't you think "one way or the other" is a little loose?
>>
>>>> Yes and no. Yes - the SQL "relational" vendors are slowly coming
>>>> 'round - although the way they implement roughly equivalent
capability
>>>> is nowhere near as slick as traditional "Pick". No - because what >>>> they've come up with ("nested tables") is a "bodge job", very poorly
>>>> implemented, and is therefore rarely used.
>>>> 
>>>>
>>> And may we know *exactly* why it is very poorly implemented in
your
>>> opinion ?
>>
>> I refer the honourable gentleman the the answer I gave some moments
ago.
>>
>>>>>but 
>>>>>typically I don't use it and most people don't use it, for very 
>>>>>practical reasons.
>>>> 
>>>> Yes. Because "nested tables" is a very poorly done bolt-on.
>>>> 
>>> Not at all. But because nested tables or however you want to call
them
>>> are pretty much useless.
>>
>> Yes. Pick, on the other hand, is fundamentally different and far
more
>> useful.
>>
>>>>>You can't blame the "theorists"
>>>>> 
>>>> Sure I can - when they make incorrect assumptions, criticise what
they
>>>> don't understand, and are prepared to believe others that have done
>>>> likewise.
>>>> 
>>> Oh, yes, please. The grand conspiracy of database theorists.
>>
>> Hmmm. Nope. Not a clue what you're on about. Sorry..
>>
>>>>>when you have very practical and serious 
>>>>>problems, like for example your clients investing in an obsoleted
>>>>>technology with problematic future. And it is ridiculous for you
guys to
>>>>>whine that theorists disregard your model, actually they don't, it's
>>>>>described in all theory books (now that I know it's actually about
>>>>>nested relations), they just are not so crazy about its virtues.
>>>>> 
>>>> Again - references please?
>>>
>>> Abiteboul, Hull & Vianu, they are very much neutral on the
subject.
>>
>> I promise I will follow these references up some time - if they really are
>> regarding and describing the Pick model, even if they choose to call it
>> something else (although I don't know why they would choose to call it
>> something other than Pick or MultiValued - they should realise that >> they'll miss much of their target audience). Hmmm. I'm not feeling very
>> confident... I'm afraid it's going to have nothing at all to do with
>> Pick... What do you reckon folks?
>>
[snip]
>
> So I'm still curious for the exact difference that nesting tables buys me.
>
> What the hell, give us a simple example, with CUSTOMER, ORDER
> ORDERLINES, PRODUCTS. I believe you'd nest ORDERLINES into ORDERS, so
> what's the big deal about it ? What's the big practical advantage ?
>
> I can do that in Oracle as well, as a practical example, in a
> theoretical relational model as well since columns can have any types,
> including relation types.
>
> Or any example at all that is significant.
>
> Best,
> Costin

Example:

We have a sales ledger requirement.

We need to be able to record, against each invoice:

There are other requirements - but they are not known/defined at this stage. We were lucky to get this much detail. The CEO for the company we're writing this application for is a very hard man to pin down. He wants it now but he doesn't know what it is exactly. We had a five minute chat with him and, based on that, we come up with:-

Pick file layout description:
File: INVOICES
ID: Invoice Number

001: Product codes (multivalued)
002: Quantity ordered (multivalued)
003: Requested Ship-by Dates (multivalued)
004: Quantity despatched - for each partial shipment (subvalued)
005: Despatched Date for each partial shipment (subvalued)

We create the file and write a program to allow them to start using it. They spend the morning entering data they had recorded on various bits of paper.

Later that day we look at some of the data. Here's what it looks like:-

Actual Pick file contents:
INVOICES
ID: 12345

001: ABC*123*RED]ABC*456*RED]DEF*123*BLUE]DEF*789*WHITE
002: 100]400]200]300
003: 13085]13090]13090]13087
004: 50\50]230]200]100\120
005: 13079\13080]13080]13079]13078\13080

The boss now wants a printout of this particular invoice.

We create some dictionary items:

ED DICT INVOICES Product Size Colour OrderQty ReqShipDate ShipQty ActShipDate

Product
001 S
002 1
008 G*1
009 L
010 7

Size
001 S
002 1
008 G1*1
009 L
010 4

Colour
001 S
002 1
008 G2*1
009 L
010 6

OrderQty
001 S
002 2
003 Quantity]Ordered
009 R
010 8

ReqShipDate
001 S

002 3
003 Ship By
007 D
009 R
010 11

ShipQty
001 S
002 4
003 Quantity]Despatched
009 R
010 10

ActShipDate
001 S
002 5
003 Despatched On
007 D
009 R
010 13

...and produce a listing by entering:

:list invoices '12345' product size colour total orderqty reqshipdate total shipqty actshipdate

Page   1     invoices                                    18:12:14 23
Oct 2003

invoices.. product size colour Quantity Ship By.... Quantity.. Despatched On

                               Ordered              Despatched

12345      ABC     123  RED         100 28 Oct 2003         50   22
Oct 2003
                                                            50   23
Oct 2003
           ABC     456  RED         400 02 Nov 2003        230   23
Oct 2003
           DEF     123  BLUE        200 02 Nov 2003        200   22
Oct 2003
           DEF     789  WHITE       300 30 Oct 2003        100   21
Oct 2003
                                                            20   23
Oct 2003
***                                1000                    650

[405] 1 items listed out of 1 items.

Question: How many disk reads do you think it took to get all of the information shown above relating to the invoice?

Feel free to ask any questions or request any changes. You be the boss.

Regards
Mike. Received on Thu Oct 23 2003 - 06:23:58 CEST

Original text of this message