Re: How to optimize a table for segment data compression?

From: shridned <shrniad_at_yahoo.com>
Date: Sun, 24 Nov 2002 13:16:26 +0100
Message-ID: <arqfme$vo0$03$1@news.t-online.com>

we have done this very successfull.

ordering will improve compression for large repeating groups

create table as select * from order by

insert /*+append*/ select * from order by

the order by can also improve the clustering factor of an index.

build the indexes with pctfree 0

for table which contain old data and are inserted with new data:

partition the table in an old and a new part ( range partition on the primary key for example ) the old partition is compressed , of course, the new partition not

use local indexes !!
this will speed up inserts greatly , because the old indexes partition don't need to be broken up

from time to time you have to recreate the old partition with the addition of the rows from the new partition. your compression will get better and better over time, as the repeating groups are increasing if you sort the data

"Vsevolod Afanassiev" <vafanassiev_at_aapt.com.au> schrieb im Newsbeitrag news:4f7d504c.0211220233.1255e5f_at_posting.google.com...
> First, thanks to people who responded on my previous post regarding
> data
> compression.
>
> * * *
>
> We have a database with large historic table, partitioned by day.
> New records are constantly loaded "at the top" and N days old
> partition
> is truncated "at the bottom". Once loaded, the records never change.
> Looks like a perfect candidate for 9.2-style data compression.
>
> Our tests show compression ratio of about 2 - 2.5, with 16K block
> (biggest block size for Solaris) and row size approx. 200 bytes.
> We thought that compression ratio can be improved by loading
> pre-sorted records to increase the probability of records in the
> same block having similar values. It is possible to dump
> a whole partition into a file, sort it, and then load back into the
> database.
>
> However, what columns to use for sort? Probably, start from columns
> with smallest number of distinct values (like Yes/No)?
> Do we need to take into account the column size as well?
> It may be more important to group together similar VARCHAR(100)
> values even if there are 1000 of them per partition then to group
> CHAR(1) values with 10 distinct values.
>
> So my suggestion is: rank all columns by ratio
>
> Ratio=(number of distinct values)/(average size of one value)
>
> and then sort, starting from the column with smallest Ratio,
> then second smallest, and so on.
> Does it make sense?
>
>
> * * *
>
> The While Paper on Data Segment Compression mentions
> that the highest compresion ratio they achieved was 12 for Call Data
> Records (CDRs). Well, my table also contains CDRs, but so far I
> couldn't get compression ratio more than 2.5.
>
> Thanks,
> Sev
Received on Sun Nov 24 2002 - 06:16:26 CST