Re: Normalization Question

From: Marshall Lucas <mlucas_at_liapartners.com>
Date: Thu, 03 Feb 2005 09:31:07 -0600
Message-ID: <%srMd.9$zs.7_at_okepread04>


Neo wrote:

>>>Notice the field just above the phone number, it's a flag th­at
>>>tells what the type of phone number is.

>
>
> I see, so "H]C]C" would allow one to have one home, two cell and no
> international #s.
>
> Is there a db level mechanism that ensures changes to a thing will be
> reflected or synchronized with other representations of the same thing
> in the db even if they are one of mulitiple values in various fields?
>
> Also if some phone# have suparts (ie extentions, hours of operation,
> alternate number, etc), how would the sub parts be stored without
> redundancy?
>

No, the DB does not insure any integrity within the data itself. It's only job is to manage the data to and from the disk in an efficient manner and to interpret data according to the dictionary items. All data level integrity is handled by the programmer.

I can carry the multi-value idea out ad-infinitum. So I could add another field called INFO and one called INFO.TYPE and have a flag in the type field like E\H\A and the data INFO could be 123\3200-18000\2341231234. This could be inside one of the values above.

H]C]C
1231231234]1231234]1231231234

E\H]E\A]E
123\3800-18000]123\1231231234]123

so now we have 3 dimensional data, FIELDS, VALUES, SUB-VALUES. By the way, the ]\ are actually high order characters ] = char(253) and \ = char(252), there's also a text mark that is char(251). But of course, I could keep going as long as I didn't use a character that was used within my data. ^ = char(254) is the field mark and _ = char(255) is a record mark.

A physical record therefore looks like:

_123^Marshall]Lucas^123 W.
Somewhere^73001^H]C]C^1231231234]1231234]1231231234^E\H]E\A]E^123\3800-18000]123\1231231234]123_

That's all the DB physical stores. It's an extremely efficient string parser so it pulls the marks apart quickly. Plus the ID 123 is hashed and stored in GROUPS within the physical file. So a single read of a FRAME (smallest unit within a file) usually garners the record, but sometimes in a very full file, a few frames will be read to find it. Many of the newer MVDBMSs store a header FRAME first in each GROUP that contains a list of all IDs and which FRAME each is in, so it's always exactly 2 reads for any piece of data. Because of the hashing algorithm, this is much faster than one might thing because it takes into account the rotational geometry of the disk and minimizes head movement.

As far as distribution, fulfillment and the like, it's really the best system going. Anheiser-Busch is a long time MVDB shop. As are many large distribution companies and insurance companies too.

Marshall Received on Thu Feb 03 2005 - 16:31:07 CET

Original text of this message