Re: Index compression vs. table compression

From: Howard J. Rogers <hjr_at_dizwell.com>
Date: Wed, 29 Dec 2004 13:17:13 +1100
Message-ID: <41d213aa$0$5112$afc38c87@news.optusnet.com.au>

Rick Denoire wrote:
> "Howard J. Rogers" <hjr_at_dizwell.com> wrote:
>
>

>>Unless you twiddled things when you were creating your file systems, 
>>then that's wrong: it should be 4K (assuming you're using bog-standard 
>>ext2, for example). You'll have I/O problems as a result of having an 
>>Oracle block size that's bigger than your file system buffer size.

>
>
> This is a question I still have to investigate. Do you have numbers?
> Benchmarks? Records of I/O per sec? One has to MEASURE in order to
> know - too many myths out there, and this is a complicated matter.

Of course. That's why Steve Adams has articles on it, and why I do.

> I
> would expect that several read requests of contiguous blocks be
> gathered into one request by the OS before being actually issued.

Let's not expect things, eh?!

> Isn't this a classical feature of Unix-like filesystem buffers? (I
> won't go into read-ahead details done by the storage).

Go read what Steve Adams has to say on the matter. http://www.ixora.com.au.

>
> This is similar to using the DB_FILE_MULTIBLOCK_READ_COUNT parameter
> to coalesce adjacent I/Os, I think (but for FTS).
> The file systems can be mounted with the sync option, avoiding file
> system buffering.

Good. If you can do direct I/O, then that's fine.

> Would that then aliviate the mismatch between Oracle
> blocksize and filesystem blocksize?

This is not about "filesystem blocksize" but about "file system buffer size". There's a difference. And it should hopefully then be self-evident that if you by-pass the file system buffer then of course there is no mismatch between the Oracle block size and the file system buffer size -because you don't actually have one of the latter if doing direct I/O!

> The kernel (2.4.e-37) of the Advanced Server Version of RedHat is
> tweaked in several ways, Oracle was relinked with the Sync-I/O
> capability at install time. This version is quite *expensive*, by the
> way.
>
>

>>Adopting 16K blocks is going to make matters worse. And bear in mind 
>>that there is no recycle or keep caches for your 16K buffer cache... 
>>meaning physical I/O is likely to be higher than it would have been if 
>>you'd just left things alone.

>
>
> I don't understand the sense of a recycle cache.

Ouch. You should do, let's put it that way.

> Since buffers for
> blocks read via FTS are put at the end of the MRU list anyway, the
> main cache already "recycles" block buffers with low "life
> expectancy", and reserving a recycle cache at the expense of the main
> cache for just this kind of operation could be a waste of memory in
> situations where the memory could still be used to hold "hot" blocks
> instead of being otherwise exclusively reserved for "wasted" blocks.

If it's a FTS, you won't want the blocks cached AT ALL, probably. And if that's the case, holding them in cache, even for the time it takes them to age out, is *definitely* a waste of memory.

It's also the case that blocks can slip into the cold half of the LRU list which you'd rather not have flushed out. By directing FTS to their own portion of the cache, you can give such blocks a bit more time to get 're-warmed' up to the MRU end of the LRU list.

A recycle cache can be, in short, extremely useful.

> In the case of the keep cache, I admit that you are completely right.
> I am aware of that, but I suspect that the advantage of using 16K
> blocks would overcompensate the lack of the 16K keep cache.

As you say: test that. With direct I/O, it won't be a problem as such. But the lack of the keep/recycle caches is still a potential nasty.

>>Remember always that Oracle invented multiple block size support in 9i 
>>for one reason, and one reason only: to make transportable tablespaces 
>>work properly between databases with different db_block_sizes. Certain 
>>well-known authors claiming 'bung everything into 32K blocks' has just 
>>muddied the waters... but don't be taken in.

>
>
> On the other hand, it is generally a sound conception that for DB
> operations spanning large amount of data using analytical functions,
> larger blocks are better suitable - independently of the reason why
> multiple block size support was invented. Larger blocks are also more
> favorable for compression.

But you are now discussing whether to create a database using large or small blocks, for which your comments are entirely appropriate. But that's a rather different issue from whether or not it makes sense to have a mix of block sizes in the ONE database. As I've written many times in the past, *IF* you have direct I/O, *AND* you don't have potential contention issues (ie, OLTP-type environments) then it is at least worth experimenting. But the lack of keep and recycle caches can still mean it is not worth doing.

>>>That is quite simple to do. A simple "alter table xxx move compress
>>>tablespace DWH_16C" should do it. 
>>
>>But not necessarily particularly well. Table compression works because 
>>rows with duplicate data co-reside in the same block. Therefore, if you 
>>have a random ordering of data in the table, simply switching on 
>>compression in the way you describe is unlikely to achieve anything very 
>>much. If you're going to do this, take the opportunity to physically 
>>order your table rows such that low-cardinality (and hence 
>>high-duplicating) columns are sorted together. And try not to use ASSM 
>>when you do this, otherwise it won't work (because ASSM scatters the 
>>rows regardless of your ordering request)!

>
>
> That is good advice. Unfortunately, I am trying to use the time window
> between Christmas and New Year and I can't investigate the tables
> thorough enough to manually tweak the operations.

Then my advice is not to do it at all. If you can't evaluate something like this properly, beforehand, you're simply asking for trouble.

> It is a simple
> matter of time. Rebuilding indexes (they will be left invalid) alone
> will take at least one day. Some tables have more than a dozen of
> columns, I would have to determine the cardinality along with the
> distribution of values, and still, their length - the longer the
> values, the more one can compress.

Precisely.

>>In other words, a better approach is:
>>
>>create table X as select * from Y
>>truncate table X;
>>alter table X move compress;
>>insert into X select * from Y order by col4,col2,col7;

>
>
> Then of course one should take care to regrant privileges, redefine
> indexes... making the process for many tables prone to error.

My apologies. What I wrote was just full of silly mistakes. Let me try it again ("Y" is the EXISTING table you wish to compress):

create table X as select * from Y;
truncate table Y;
alter table Y move compress;
insert into Y select * from X order by col4,col2,col7;

Make sense now? And no, because of the truncate rather than a drop, there would be no regranting of privileges needed, nor any index rebuilds necessary.

>>Just as a quick example, and to show you the importance of correct row 
>>ordering to achieve decent compression:

>
>
> Your example about different compression results is impressive, in the
> style of Tom Kyte. Yes, it makes sense in general to
> order-before-compress. I very much miss something like "ALTER TABLE
> xxx MOVE *ORDER BY yyy*". What is actually the order in effect when
> moving a table?

None. Well, OK, it's the same order any FTS would yield. Which means you cannot know, since without an explicit ORDER BY clause, a FTS cannot be guaranteed to return results in any specific order.

>>>But with indexes, there is a
>>>difference I don't quite understand. Doing "ALTER INDEX xxx REBUILD
>>>COMPRESS y TABLESPACE DWH_16_IDX" would work, but I don't understand
>>>the meaning of "y" in this case.
>>
>>If you have multi-column indexes, how many columns do you wish to be 
>>considered for duplicate suppression? For example, if you had an index 
>>on department, name and salary, would you want:

>
>
> OK, thanks for your explanation. I suppose that the number of columns
> is syntactically optional, letting it be the maximum then.

Provided you don't have unique indexes (which you generally shouldn't, anyway). Then the maximium is n-1, where n is the number of columns in the index. For obvious reasons, I hope!

> By the way, one disadvantage of compressed tables you forgot: columns
> can't be added or dropped!

I didn't realise I was cataloguing the pros and cons of compressed tables! I certainly wasn't intending to do so. But if I had been, this wouldn't have been one of the cons. It strikes me as generally extremely bad practice to add or drop columns from a table in production. Losing the ability to do so would not cause me to lose sleep, therefore.

> If you have tangible information about the blocksize discrepancy and
> its impact to performance, I would REALLY appreciate a pointer to it.

I do have tangible pointers to Steve Adam's website where it's discussed. It's silly putting out benchmarks on the matter, and neither Steve nor I do. For me, at least, that's because the benchmarks would be true for my machine, my configuration, my hard disks, and would be out of date the minute they were produced. I can tell you I did some tests about 18 months ago, and a full table scan of a million row table took approximately 25% longer with the "wrong" block size than with the "right" one (RHES3, ext3, 4K file system buffer size), but make of that what you will. The point is that the theory is sound, it's well argued by Steve amongst others, and mucking about with block sizes is therefore something one would have to benchmark and test on your *own* hardware before indulging in it. But, in any case, the little nugget of information about you having direct I/O, means that specific benchmarking argument is moot. You are left merely wondering whether the lack of keep and recycle caches isn't going to be more of an issue... and that would require a separate set of benchmarking tests.

The real point here is that under NO circumstances does one quickly 'knock up' an odd-sized tablespace, because the costs could well be bigger than the benefits, and you can only know whether that's true or not after careful, patient testing and evaluation. Unless you like living dangerously, or get lucky with the State Lottery on a regular basis, of course -in which case, be my guest.

Of course, if you're not using keep or recycle caches at the moment anyway, then the lack of them for an odd-sized tablespace isn't going to be of a great concern. But then I'd suggest that your not using those caches for standard block size caches should itself be a source of great concern!

Regards
HJR
>
> Regards
> Rick Denoire
>
Received on Tue Dec 28 2004 - 20:17:13 CST