Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Index compression vs. table compression

Re: Index compression vs. table compression

From: Rick Denoire <100.17706_at_germanynet.de>
Date: Wed, 29 Dec 2004 02:18:53 +0100
Message-ID: <ngt3t0pv2mnd28ke7c2fa27h8g90ghnpb8@4ax.com>


"Howard J. Rogers" <hjr_at_dizwell.com> wrote:

>Unless you twiddled things when you were creating your file systems,
>then that's wrong: it should be 4K (assuming you're using bog-standard
>ext2, for example). You'll have I/O problems as a result of having an
>Oracle block size that's bigger than your file system buffer size.

This is a question I still have to investigate. Do you have numbers? Benchmarks? Records of I/O per sec? One has to MEASURE in order to know - too many myths out there, and this is a complicated matter. I would expect that several read requests of contiguous blocks be gathered into one request by the OS before being actually issued. Isn't this a classical feature of Unix-like filesystem buffers? (I won't go into read-ahead details done by the storage).

This is similar to using the DB_FILE_MULTIBLOCK_READ_COUNT parameter to coalesce adjacent I/Os, I think (but for FTS).

The file systems can be mounted with the sync option, avoiding file system buffering. Would that then aliviate the mismatch between Oracle blocksize and filesystem blocksize?

The kernel (2.4.e-37) of the Advanced Server Version of RedHat is tweaked in several ways, Oracle was relinked with the Sync-I/O capability at install time. This version is quite *expensive*, by the way.

>Adopting 16K blocks is going to make matters worse. And bear in mind
>that there is no recycle or keep caches for your 16K buffer cache...
>meaning physical I/O is likely to be higher than it would have been if
>you'd just left things alone.

I don't understand the sense of a recycle cache. Since buffers for blocks read via FTS are put at the end of the MRU list anyway, the main cache already "recycles" block buffers with low "life expectancy", and reserving a recycle cache at the expense of the main cache for just this kind of operation could be a waste of memory in situations where the memory could still be used to hold "hot" blocks instead of being otherwise exclusively reserved for "wasted" blocks.

In the case of the keep cache, I admit that you are completely right. I am aware of that, but I suspect that the advantage of using 16K blocks would overcompensate the lack of the 16K keep cache.

>Remember always that Oracle invented multiple block size support in 9i
>for one reason, and one reason only: to make transportable tablespaces
>work properly between databases with different db_block_sizes. Certain
>well-known authors claiming 'bung everything into 32K blocks' has just
>muddied the waters... but don't be taken in.

On the other hand, it is generally a sound conception that for DB operations spanning large amount of data using analytical functions, larger blocks are better suitable - independently of the reason why multiple block size support was invented. Larger blocks are also more favorable for compression.

>> That is quite simple to do. A simple "alter table xxx move compress
>> tablespace DWH_16C" should do it.
>
>But not necessarily particularly well. Table compression works because
>rows with duplicate data co-reside in the same block. Therefore, if you
>have a random ordering of data in the table, simply switching on
>compression in the way you describe is unlikely to achieve anything very
>much. If you're going to do this, take the opportunity to physically
>order your table rows such that low-cardinality (and hence
>high-duplicating) columns are sorted together. And try not to use ASSM
>when you do this, otherwise it won't work (because ASSM scatters the
>rows regardless of your ordering request)!

That is good advice. Unfortunately, I am trying to use the time window between Christmas and New Year and I can't investigate the tables thorough enough to manually tweak the operations. It is a simple matter of time. Rebuilding indexes (they will be left invalid) alone will take at least one day. Some tables have more than a dozen of columns, I would have to determine the cardinality along with the distribution of values, and still, their length - the longer the values, the more one can compress.

>In other words, a better approach is:
>
>create table X as select * from Y
>truncate table X;
>alter table X move compress;
>insert into X select * from Y order by col4,col2,col7;

Then of course one should take care to regrant privileges, redefine indexes... making the process for many tables prone to error.

>Just as a quick example, and to show you the importance of correct row
>ordering to achieve decent compression:

Your example about different compression results is impressive, in the style of Tom Kyte. Yes, it makes sense in general to order-before-compress. I very much miss something like "ALTER TABLE xxx MOVE *ORDER BY yyy*". What is actually the order in effect when moving a table?

>> But with indexes, there is a
>> difference I don't quite understand. Doing "ALTER INDEX xxx REBUILD
>> COMPRESS y TABLESPACE DWH_16_IDX" would work, but I don't understand
>> the meaning of "y" in this case.
>
>If you have multi-column indexes, how many columns do you wish to be
>considered for duplicate suppression? For example, if you had an index
>on department, name and salary, would you want:

OK, thanks for your explanation. I suppose that the number of columns is syntactically optional, letting it be the maximum then.

By the way, one disadvantage of compressed tables you forgot: columns can't be added or dropped!

If you have tangible information about the blocksize discrepancy and its impact to performance, I would REALLY appreciate a pointer to it.

Regards
Rick Denoire Received on Tue Dec 28 2004 - 19:18:53 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US