Re: A searchable datastructure for represeting attributes?

From: <no_spam_for_me_at_thankyou.very_much>
Date: Thu, 28 Feb 2002 18:30:48 -0500
Message-ID: <3C7EBDA8.75833AFE_at_thankyou.very_much>


  • Celko --,

Thanks. that's a good point. do you know whether oracle sqlserver have the support for compressing data that way.

Dre

--CELKO-- wrote:

> >> Again, since car seems to be a popular example imagine you have a
> database
> of cars in a town where almost 99% everyone likes exactly the same
> sort of
> car a blue toyota. If you store a 5 million rows like:
>
> id make color
> 1 toyota blue
> 2 toyota blue
> 3 toyota blue
> .....
> 5000000 totoyta blue
>
> Yes it is very good and normalized in Dr Codd sense but you still
> store 100
> times more information than you need to store with what I described in
> approach_1. That is absense of a key indicates a default of
> make=toyota
> and color=blue. Yes I can see how this is anathema to relational
> design but
> it is a real world problem. <<
>
> You are confusing PHYSICAL storage with the LOGICAL model. Yes, many
> file-oriented version of SQL (SQL Server, DB2, Oracle), tend to
> actually PHYSICALLY repeat values in storage. But If I were using
> the Nucleus SQL engine from Sand technology, there would be a bit
> vector with a 1 for (make = 'toyota') and a 1 for (color = 'blue') at
> the appropriate positions. This bit vector is then compressed and all
> queries are done on the compressed form. The original data is
> re-constructed one column at a time on output.
>
> The more repetition in the data, the smaller the Nucleus database
> gets. The Nucleus engine invites you to split telephone numbers in
> (area code, exchange, phone number) columns to save space because
> area codes and exchanges repeat.
>
> In fact, a good rule of thumb for this product is that the size of the
> entire database will be 80% or less of the size of the original data.
> Your data could well be less than 20% of the original size.
>
> Obviously, this is a data warehouse tool.
Received on Fri Mar 01 2002 - 00:30:48 CET

Original text of this message