Re: A Normalization Question

From: D Guntermann <guntermann_at_hotmail.com>
Date: Thu, 8 Jul 2004 20:13:44 GMT
Message-ID: <I0Juuw.Ip7_at_news.boeing.com>


"Neo" <neo55592_at_hotmail.com> wrote in message news:4b45d3ad.0407081047.564ca36b_at_posting.google.com... [snip]
> > Redundancy in terms of databases is about removing semantic duplicates,
> > not about removing syntactic duplicates. i.e. about the logical level,
> > not the physical level.
>
> The above is an example of a limited form of normalization.
The
> general form of normalization applies to everthing being represented
> (stored, not merely implied) within a db.

Neo,

Since you supplant all instances of a specific encoding with references to a single character value, wouldn't the references, which are pieces of information themselves, be redundant?

For example

Value Storage Address (16 bits)     Value (8 bits)
1                                                      'B'
2                                                      'R'
3                                                      'O'
4                                                     'W'
5                                                     'N'
6                                                     '1'
7                                                     '2'
8                                                     '3'

Fact in Neo's database
Thing Person Color Street
1 brown brown brown

Each attribute would contain redundancy in terms of the references (by your logic):

Person(1,2,3,4,5)
Color (1,2,3,4,5) <--- oh no! the most generalized form has been broken! 1,2,3,4,5 are repeated.
Street (1,2,3,4,5) <--- oh no! the most generalized form has been broken! 1,2,3,4,5 are repeated.

Not only do you needlessly denormalize information (shame!) but you don't meet your own criteria for the most generalized normal form. You have replicated references in this case at least three times, and imagine the chaos with non-related data that use the same references.

Moreover, the reference size requirements are 16 bits while the character value itself is probably only 8 bits. Thus, you needlessly design your system as inefficiently as possible. If information theory was your guide, you'd be doing the exact opposite of what one of its objectives is -- conveying value of information in the most efficient form possible.

If sytactic is stored
> (meaning it has a location within the db), they are also candidates
> for normalization.

[snip]

> The lowest (most general) level will be applicable to all higher (more
> specific) levels.

I submit, and I hope you will listen, that you in fact are focused on a lower level model that is generalized only to the extent that we encode meaning in languages. If you really want to pursue this form of generalization, and you are really honest about finding the purest and most generalized form, then you better start with the bit and make your references to single bit values from there. In one sense, you are working towards a true Turing machine, a powerful concept, but not at all practical.

Regards,

Dan Received on Thu Jul 08 2004 - 22:13:44 CEST

Original text of this message