Re: foundations of relational theory? - some references for the truly starving

From: Costin Cozianu <c_cozianu_at_hotmail.com>
Date: Wed, 22 Oct 2003 23:09:48 -0700
Message-ID: <bn7rd0$tul6k$1_at_ID-152540.news.uni-berlin.de>


>>I think the reference is not the greatest. In any case nested table can
>>be implemented "in situ" also (see Oracle 8i/9i complete reference,
>>either in book or online).
>>
>
>
> Could you please provide a link for this?
>
>

Register at OTN, then go to 8i or 9i docs and search for NESTED TABLES and VARRAYS. There are tons of options and descriptions going on from there.

Regardless of nested tables, people prefer CREATE CLUSTER as it is cleaner
>>In any case in Oracle you INSERT/QUERY the records in nested table
>>declaratively, i.e. they have the keyword named THE, which is roughly
>>equivalent to the UNNEST operator in nested relations theory.
>>
>>The same applies to Pick or any MV database, when you increase the size
>>of a record you decrease the performance of all basic operations on the
>>table.
>
>
> This is sometimes true - although not necessarily. The determining
> factor is whether or not the read or update involves a "frame fault".
> If no "frame fault" in involved then performance is the same whether
> the item being accessed is 15 bytes long or 1500. Performance is also
> equal regardless of the number of items on the file or total file
> size. It will take no longer to read one 1500 byte item from a 10 gig
> file containing 30m items than it would to read a 15 byte item from a
> 10k file containing 100 items.
>

BS. Or do you mean you don't write to log files ?

You also have to count that increasing record size decreases the number of rows stored in a single page (frame)

> It might be worth mentioning at this stage that the amount of data you
> can store in a Pick item will require far less physical disk space
> than on any other (uncompressed) database.

Another big bullshit of the month. You guys live under the impression that you have discovered the bicycle and you are technologically advanced.

> There are two reasons for
> this:
> 1) because every field is of variable length - every datum is just as
> long as it needs to be and no more and empty fields occupy no space at
> all (other than single character system delimiters) -

Sure. Every DBMS and their grandmo's have this feature, actually it's probably implemented better because numbers are in binary, there's a certain room for further updates, etc, etc.

and
> 2) because there is often no need for pointers to related data (the
> related data is within the item itself).
>
>

Oh, yes. Very big deal. A 4 bytes is a tremendous disaster. In any case, clusters and arrays and nested tables are there.

>>So if the master table is accessed/updated more frequently then
>>accesses for the information in the two of them combined
>
>
> What "master" table? Which two? In Pick/MV there need only be one
> item. One access/update. One head movement. READ INVOICE FROM
> INVOICES,'12345' need not require any other reads to get the "line
> item details" - they can be in the INVOICE item as multivalues (and
> possibly subvalues for finer detail within each "line item").
>
>

One access ? Does it write byte by byte, or does it write frames to the disks ?

How about transaction logs ?

And spear me the head movements, per update. This calculus is as valid as asking people how many clock cycles a Pentium 4 Xeon will spend for an ADD AX, BX. Guess what, more often then not is less than one.

>>there's not a
>>good case to cluster the parent/children records (or whatever your
>>nomenclature is) together.
>>
>>
>>>In Pick the nested rows don't need pointers - they are in-situ. The
>
> item,
>
>>>or record, contains all of the related data - hence we get it all
>
> in a
>
>>>single read. No need to "dereference" anything. Not such a "plenty
>
> of
>
>>>complications".
>>>
>>
>>The "complications are with regards to integrity constraints,
>>normalization theory and query optimization.
>>
>
>
> Data integrity is simplified, normalisation is greatly simplified and
> there's no need to optimise anything.

No because you pre-program moving your cursors just like in FoxPro, right ?

Oh, gosh. It becomes boring. You guys ought to rewrite the database theory, cause you just claim you invented the equivalent of perpetuum mobile.

So can you give me an example of MV table in 4NF but not in 5NF ?

> I know. Sounds incredible
> doesn't it? I'll try to remember to include an example at the end of
> this post.
>

Sounds delusional.

Ok, it's not very delusional as long as you manage to turn applications to the users, but it is delusional to think that if you have a bycicle you have a big deal.

Remember, as of this day, thousands of people wrote tons of lines of code accessing COBOL indexed files, and delivering applications.

>>So I'm still curious for the exact difference that nesting tables buys me.
>>
>>What the hell, give us a simple example, with CUSTOMER, ORDER
>>ORDERLINES, PRODUCTS. I believe you'd nest ORDERLINES into ORDERS, so
>>what's the big deal about it ? What's the big practical advantage ?
>>
>>I can do that in Oracle as well, as a practical example, in a
>>theoretical relational model as well since columns can have any types,
>>including relation types.
>>
>>Or any example at all that is significant.
>>
>>Best,
>>Costin
>
>
> Example:
>
> We have a sales ledger requirement.
>
> We need to be able to record, against each invoice:
>
> * The Products ordered; (The product codes are composite keys
> comprising a generic Product Number concatenated with a Size code and
> a Colour code - only we don't know whether they are numeric or
> alphabetic or alphanumeric, or how many characters long they are. All
> we know for sure is that they must exist on the Product file, that the
> Product File is keyed on Product Number, and that there are lists of
> allowable Sizes and Colours for each Product item)
> * The Quantity of each Product ordered; (Must be a whole number of
> units greater than 0)
> * The Date on which each Product ordered is required to be despatched
> by; (we know they want to be able to despatch products in partial
> shipments as and when they're available when the truck leaves the
> depot on its delivery rounds)
> * The Quantity of each partial shipment;
> * The Date of each partial shipment.
>
> There are other requirements - but they are not known/defined at this
> stage. We were lucky to get this much detail. The CEO for the company
> we're writing this application for is a very hard man to pin down. He
> wants it now but he doesn't know what it is exactly. We had a five
> minute chat with him and, based on that, we come up with:-
>
> Pick file layout description:
> File: INVOICES
> ID: Invoice Number
> 001: Product codes (multivalued)
> 002: Quantity ordered (multivalued)
> 003: Requested Ship-by Dates (multivalued)
> 004: Quantity despatched - for each partial shipment (subvalued)
> 005: Despatched Date for each partial shipment (subvalued)
>

This is already non-sense.

You mean you can;t have a record type
type InvoiceLine = record ProductCode: int , QuantityOrdered: float, QuantityDispatched: float, RequestDeadline: date

And have a single multivalued column with those records ? So how does the DBMS *enforce* the needed integrity constraints ?

> We create the file and write a program to allow them to start using
> it. They spend the morning entering data they had recorded on various
> bits of paper.
>

Oh sure. With no integrity constraints. Remember good old days, where we had a "Data Quality" department with data operators to double check the non-sense.

> Later that day we look at some of the data. Here's what it looks
> like:-
>
> Actual Pick file contents:
> INVOICES
> ID: 12345
> 001: ABC*123*RED]ABC*456*RED]DEF*123*BLUE]DEF*789*WHITE
> 002: 100]400]200]300
> 003: 13085]13090]13090]13087
> 004: 50\50]230]200]100\120
> 005: 13079\13080]13080]13079]13078\13080
>

That looks great.

> The boss now wants a printout of this particular invoice.
>

Hey, I was doing that when I was a kid.

> We create some dictionary items:
>
> ED DICT INVOICES Product Size Colour OrderQty ReqShipDate ShipQty
> ActShipDate
>
> Product
> 001 S
> 002 1
> 008 G*1
> 009 L
> 010 7
>
> Size
> 001 S
> 002 1
> 008 G1*1
> 009 L
> 010 4
>
> Colour
> 001 S
> 002 1
> 008 G2*1
> 009 L
> 010 6
>
> OrderQty
> 001 S
> 002 2
> 003 Quantity]Ordered
> 009 R
> 010 8
>
> ReqShipDate
> 001 S
> 002 3
> 003 Ship By
> 007 D
> 009 R
> 010 11
>
> ShipQty
> 001 S
> 002 4
> 003 Quantity]Despatched
> 009 R
> 010 10
>
> ActShipDate
> 001 S
> 002 5
> 003 Despatched On
> 007 D
> 009 R
> 010 13
>

IS this binary file dump, or is this code, or what;s this mumbo jumbo ?

> ...and produce a listing by entering:
>
> :list invoices '12345' product size colour total orderqty reqshipdate
> total shipqty actshipdate
>
> Page 1 invoices 18:12:14 23
> Oct 2003
>
> invoices.. product size colour Quantity Ship By.... Quantity..
> Despatched On
> Ordered Despatched
>
> 12345 ABC 123 RED 100 28 Oct 2003 50 22
> Oct 2003
> 50 23
> Oct 2003
> ABC 456 RED 400 02 Nov 2003 230 23
> Oct 2003
> DEF 123 BLUE 200 02 Nov 2003 200 22
> Oct 2003
> DEF 789 WHITE 300 30 Oct 2003 100 21
> Oct 2003
> 20 23
> Oct 2003
> *** 1000 650
>
> [405] 1 items listed out of 1 items.
>

Now, that's quite a feat.

> Question: How many disk reads do you think it took to get all of the
> information shown above relating to the invoice?
>

Approximately the same as in any modern DBMS.

Guys just the same as modern processors have SIMD and pipelines and prefetch, and speculative execution, and so on so forth, modern DBMSes use SCSI arrays, and partition data and cluster data, and amortize head movements across the tables, and so on, so forth, to the point that your insistence on head movements is about ridiculous.

Now look at
http://www.tpc.org/tpcc/results/tpcc_price_perf_results.asp?resulttype=noncluster

and see that for ropughly 40K$ hardware + software for server and clients money can buy 20K TPMC transactions per minute. A percentage of which is enough to run 99.999% businesses. Ten years ago, bank mainframes didn't have that throughput in batch mode, not to mention in concurrrent mode.

If at this performance point you are still under the delusion that Picks head movements count, well, you are very much beyond reasonable argument.

In the life a typical business systems for OLTP (OLAP is slightly different ball game), head movements to the disks where main data is stored are well amortized to maybe under 1 per record query/update/insert/delete, just the same as a typical integer operation is at 1 cycle per second or even less for highly optimized C/Fortran code. If you go at Intel you will notice that there is no longer a documentation of how many clock cycles an instruction will take as it used to be for 386. The same happened to DBMSes, head movements stopped being a significant factor long time ago (maybe more than 5).

> Feel free to ask any questions or request any changes. You be the
> boss.
>
> Regards
> Mike.

Best regards,
Costin Received on Thu Oct 23 2003 - 08:09:48 CEST

Original text of this message