Re: Normalization question

From: Steve Kass <skass_at_drew.edu>
Date: Thu, 27 Jun 2002 01:20:45 -0400
Message-ID: <3D1AA0AD.353B354B_at_drew.edu>


Jan,

  Good point about what normalization does and doesn't deal with. I would add that the issue here is more in the realm of information theory (i.e. complexity), and we're now thinking about how to describe the information with the fewest bytes.

  Another example might be a database of Australians with "preferred language" as an attribute. It may well be that 95% of the entries list English for this attribute. If so, one could save space by eliminating the attribute and maintaining a separate table with one row for each non-English speaker containing his or her preferred language. Similarly, one could do this and save space for any attribute with a very prevalent mode.

  The example here in fact is very similar to some popular schemes for image compression. Recording just the changes is a lossless compression scheme. And the parallel goes a bit further, in that both for a .gif image and for the table in the compact form Erland suggested (no End_dt), there are similar disadvantages. If a row of the table is corrupted or lost, a potentially large amount of information can become irretrievable. And without corruption, the information itself becomes harder to retrieve (to wit Erland's use of both compressed and uncompressed tables, depending on the situation). Is it worth compressing the data if the savings will be undone by the increased work needed to uncompress the data?

    Great stuff to think about!

Steve Kass
Drew University

"Jan.Hidders" wrote:

> In article <a386bae7.0206261051.6c87a148_at_posting.google.com>,
> leebert <leebertarian_at_yahoo.com> wrote:
> >
> >Old Data Model (key is ID + Year + Month) :
> >
> >New Data Model (key is ID + Begin_dt + End_dt, requires trigger for
> >complete enforcement)
>
> Actually you have two smaller candidate keys: ID + Begin_dt and ID + End_dt.
> But even if you enforce both you will still need triggers to prevent
> overlapping periods for the same ID.
>
> >I know if I went back to 1st principles and read some weighty tome on
> >normalization I'd find an answer, but I'm taking a shot & see what
> >more experienced folks have to say, save some time narrowing down
> >what's best practice.
>
> Classical normalization doesn't really cover this type of redundancy. It
> only covers redundancy that can be solved by splitting a table into smaller
> projections. The old and the new data model could very well be in 5NF.
>
> -- Jan Hidders
Received on Thu Jun 27 2002 - 07:20:45 CEST

Original text of this message