Re: DB clasical structure violation

From: Anthony W. Youngman <thewolery_at_nospam.demon.co.uk>
Date: Thu, 4 Jul 2002 23:38:57 +0100
Message-ID: <nXrOODBB6MJ9EwAj_at_thewolery.demon.co.uk>


In article <3d20b9d4.1898800_at_oak.cise.ufl.edu>, Alfredo Novoa <alfredo_at_nospam_ncs.es> writes
>On 25 Jun 2002 06:42:28 -0700, anthony.youngman_at_eca-international.com
>(Anthony Youngman) wrote:
>
>
>>Why, then, are Oracle and Microsoft busy copying the features that
>>we've had for thirty years, and adding them to their databases? Are
>>you saying that they're busy adding square wheels?
>
>Which features?
>
>Can you give references about the multivalued data model?
>
>I have not heared very much about multivalued databases.
>
Basically, chaptered rowsets (as I believe MS call it), or "nested tables" as it's simpler to refer as.

Let's take a very basic database. I want to store a list of countries, along with their next-level administrative subdivisions. America has states, as does Canada. Switzerland has cantons, Germany has Lande, I'm not sure what you'd do about my country (politically, it's the UK, which is subdivided into four ethnic countries... - never mind ...)

I would store this in a single table. The key would be the ISO country code, Field 1 would be the country name, Field 2 would be the subcode (for America, eg WA, FL, etc...), and Field 3 would be the subname (eg Washington, Florida). The point about MultiValue is that each cell can contain a list, and fields can be associated.

So although it costs me a bit more effort in programming, I now have what you would see as a second table, keyed on the above table's key and field 2, and with its value as field 3. Within the cells, the ordinality of the values is important - the first value in field 2 matches the first value in field 3, etc etc.

This breaches 3NF, but as you can see, I don't have any redundant data so I don't have integrity problems there. Deleting the country will take the subcodes with it, so I don't have integrity problems there. I can't add a free-standing subcode because there is no country row to put it in so I don't have integrity problems there ...

And one only has to look at MS's "the database is the file system" (where they are planning to use SQL-Server as the disk format) to say "Pick was there 30 years ago". It's only fairly recently that you could add fields to most SQL database tables without having to rebuild the table - Pick had that when it was born 30 years ago. It's only recently most SQL databases have added variable length fields where they only take up the storage they need - that's the only sort of length field Pick has ever had...

Don't forget I said the Pick language is a 3.5GL. So you're a lot "closer to the metal" than you are with the abstractions of SQL. The only sort of file Pick has ever used is the hash file. And for the last twenty years, I've never had the need to manually rehash a file - the system does it for me, on the fly, and so fast I don't realise it's doing it. MS and Oracle are now copying our techniques for the internal management of their data inside the database. If you gave me a *GIG*sworth  of data, I bet I could start with hashed file size 0 modulo 0, and fill that file with that data faster than someone else could fill a btree starting at size 0. And my hashed file would be optimally hashed as the fill finished... Don't forget. Given a known key, an optimally hashed file will ALWAYS beat any other file format for retrieving a data record. And given a relational database, almost all your data access will be based on known keys. So if you program in a relational manner, on top of a Pick/MultiValue database, then as soon as your program starts using disk at all heavily, you will blast any SQL-based database into a cocked hat for speed. Plus, with the nested-table capability, you can step outside SQL restrictions with ease. For example, I said I never use a table to define relationships - I always store foreign keys with the entity that refers to them. I can store a list of invoices in the company record (actually, that would be inefficient design, and pragmatism says this is a case where maybe I should have a relationship table). But I could create an inverted index instead on the invoice file, and ask the invoice file "which invoices refer to company X". Here, each invoice would only refer to one company, but I could equally easily ask "which invoices contain product code Z". And because the index itself is a hashed file, it will have records keyed on Z, and a single field containing a list of the invoice numbers that contain Z as an item on them.

Download jBASE (www.jbase.com), go the newsgroup news://comp.databases.pick, and play. It'll broaden your mind immensely!

I won't say download MaVerick... it's coming along nicely, and if you want to pile in and help we'd love it, but it's not really ready for newbies to MultiValue yet, I don't think.

Cheers,
Wol

-- 
Anthony W. Youngman <pixie_at_thewolery.demon.co.uk>
'Yings, yow graley yin! Suz ae rikt dheu,' said the blue man, taking the 
thimble. 'What *is* he?' said Magrat. 'They're gnomes,' said Nanny. The man 
lowered the thimble. 'Pictsies!' Carpe Jugulum, Terry Pratchett 1998
Visit the MaVerick web-site - <http://www.maverick-dbms.org> Open Source Pick
Received on Fri Jul 05 2002 - 00:38:57 CEST

Original text of this message