Re: A Normalization Question

From: Chris Hoess <choess_at_stwing.upenn.edu>
Date: Fri, 9 Jul 2004 20:47:42 +0000 (UTC)
Message-ID: <slrnceu13d.g5q.choess_at_force.stwing.upenn.edu>


In article <20040709081455.29796.00001156_at_mb-m18.aol.com>, VHarris001 wrote:
>
>
> Normalizing the data this finely might be necessary in some cases. Wouldn't a
> database program that allowed, without requiring, this degree of normalization,
> be useful?

Not if the pointers, etc. are logically exposed; see TTM, etc. for explanation of why pointers are bad at the logical level. However, the physical benefits might be achieved without exposing the tangle of pointers. Consider our old friend, the enumerated type COLOR, which for our purposes is a list of red, blue, yellow, green, brown...etc. Storing these types as strings really isn't the most efficient way to do so, for most non-trivial databases (that is, where the number of tuples with a COLOR-valued attribute is significantly larger than the number of values in COLOR). Rather, it would make sense to assign, say, 000 to red, 001 to blue, 010 to yellow...and so forth, and maintain some sort of lookup function for the enumerated type which translates between the binary representation and the name of the color. This approach emphasizes semantic nature of the COLOR type: it isn't a bunch of strings, it's a bunch of *colors*, which we may prefer to represent as a string, as a swatch of the color, as the RGB values of the color, and so forth.

Note that the boundaries of the type are respected. Mr. Brown is not confused with brown hair, because we recognize that the first is, semantically, a NAME, and the second a COLOR, rather than trying to manipulate everything with strings and integers. At the physical level, COLOR can be thought of as being stored as pointers to some table or tables of enumerated representations, but the user of the database only ever manipulates the desired representation (color swatches, strings, or what have you), and the physical optimizations never interfere with the user's semantic manipulation.

> It seems that if neo's database REQUIRES normalization to the nth degree, but
> if it allows it, it seems that would be a useful feature. No?

For enumerated types, one is certainly tempted to perscribe it *in a concealed form*. For cases when one genuinely needs a free-form string, rather than enumeration, it seems to me that the gains in physical optimization are outweighed by the confusion incurred in trying to change Brown St. to Browne St. without interfering with Mr. Brown's surname.

-- 
Chris Hoess
Received on Fri Jul 09 2004 - 22:47:42 CEST

Original text of this message