Re: Index compression vs. table compression

From: Rick Denoire <100.17706_at_germanynet.de>
Date: Sun, 02 Jan 2005 02:28:26 +0100
Message-ID: <4p9et09oav479knok3n93oaq8i7monbm7d@4ax.com>

"Richard Foote" <richard.foote_at_bigpond.nospam.com> wrote:

>...discriminates against them. However blocks that are placed in the KEEP pool
>can remain cached with some assurance, *if* the size of the KEEP pool is
>larger than the sum of the size of all KEEP objects. The danger with the
>KEEP pool though is that you "keep" and allocate resources to an object that
>could be better utilised elsewhere.

Your last sentence is worth a couple of comments.

When I was assisting Oracle classes sometime ago and was learning about Oracle cache types, I was concerned by the way all other participants accepted the straight explanation and simple guidelines given for the KEEP cache unquestioned, because I couldn't. If Oracle implemented such an elaborate method to hold popular blocks as long as possible in the default cache, and if that method works, why should I try to circumvent the underlying logics and artificially force my own rules? Why should I decide by myself which tables/indexes should go into this special KEEP cache if the hot objects (or better hot blocks) would end up being kept in the default cache automatically anyway? The instructor (it was at Oracle's own training location) could not mention the *special* reasons I needed.

Any decision I would take choosing a supposedly hot table could be wrong in principle, would not discriminate between more and less popular blocks, and would stay statically in effect until the next big performance check, disregarding changing conditions. The Oracle method, on the other hand, would always yield best results by virtue of its mechanics, dynamically tracking block use; it would use available memory more effectively.

But then as I was forced to accept that using KEEP cache is a fundamental feature of performance tuning in the memory area, I "invented" my own reasons for the KEEP cache:

Oracle does not need to take care about the rank of blocks in the MRU, avoiding unnecessary CPU cycles and shortening the code length, when the blocks are already marked "privileged" by the DBA who puts them in the KEEP pool. If the blocks deserve this privilege anyway, then there is no sense for Oracle in trying to check them and track access frequency. That saves time.
Execution time can be guaranteed to be constant and small enough for queries involving objects/blocks in the KEEP cache, because in some situations that is just necessary by mere decision. The DEFAULT cache would let certain otherwise hot blocks die out if, say, a whole staff of a specific department go on holiday; at their return to work, they will perhaps complain about vital processes being to slow, until the DEFAULT cache is repopulated according to the use. This unexpected "latency" could be unacceptable for the business.
The internal memory organisation of the KEEP pool is implemented in a more effective way, based on the fact that its content is predictable, aligned and linear. There is no need to traverse B*trees or chained lists to find a block, since the KEEP pool essentially mirrors real storage without the overhead incurred by limitations in access to peripheral filesystems (*highly hypothetical*).
There are some other *special* advantages from using the KEEP instead of the DEFAULT cache that I am not awareof , at the expense of memory waste, but at times it is as simple as deciding that you can afford to waste memory in order to get certain advantages

Perhaps these arguments are not completely unreasonable, but please pay attention that they are my own INVENTION, since I had to convice myself about the use of the KEEP cache somehow. Of course, if you put some not too large tables suffering from FTS in the KEEP cache you will be very pleased by the execution of certain queries needing them. The disadvantage caused by a lot of other queries not being able to use cached blocks is diffuse and not so easy to quantify.

>And Rick, just to clarify another error of Howard's, the purpose of the
>RECYCLE pool is not necessarily to deal with FTSs. Large FTS are already
>dealt with rather efficiently in the DEFAULT pool by placing blocks on the
>least (cold) recently used end of the LRU list to be quickly reused. As
>such, they cause minimal disruption.

At this very moment I would like to go back and reread the whole stuff about the use of the RECYCLE by FTS operations... I understood that blocks read by FTS go to the RECYCLE cache automatically (if it is setup), but now I am not sure anymore. Should I set the RECYCLE attribute for tables explicitely, based on my assumption that they are going to be read via FTS?

Assuming that Oracle uses the RECYCLE cache for FTS by default, could a block occupy a buffer in the RECYCLE pool (due to FTS) and some other buffer additionally in a different pool at the same time? (Surely not, just want to raise the question).

And if I set a table to use the RECYCLE cache, will its blocks get read into this pool even if they are accessed via indexes?

>Also remember a table that's read via
>FTSs could also be read via an index by another process, assessing blocks
>that could benefit from caching. However, if you have very large, *randomly*
>accessed tables (via index access), where the likelihood of two separate
>processes requiring the same block is extremely minimal, then here we have a
>scenario where a block has been inserted in the middle of the LRU list and
>will "waste" a memory buffer or two while it slowly migrates and floats
>itself to the end of the LRU list to be finally reused. Multiply such blocks
>out by an (un)healthy sum, and we may prematurely push out other more useful
>blocks before their touch counts can increment sufficiently.

Exactly! I have been considering how to reorganize memory of several DB in these days, and came to the conclusion that I can do my best based on estimations only. It seems very difficult to assess how large the relative portion of data is that is considered to be hot, and how hot it is compared to the rest. A high difference in "temperature" supports the use of a KEEP cache. A uniform distribution of block visits over a significant portion of data would make it very difficult to dimension the different pools properly.

After some analysis I came to the conclusion that SYSTEM tables belong to the best candidates of objects to be cached. Is it a good practice to change pool attributes of SYSTEM objects?

>
>For the RECYCLE pool therefore, the best candidates are large, randomly
>access tables where blocks are unlikely to be reused by other processes,
>*not* simply FTS tables as is often stated by Howard and others.

Thanks for confirming my own hunble conclusions which I would not dare to say publicly in view of the overwhelming contrary opinions.

I have realized how complex this matter can get and began to investigate the pertinent literature. To my surprise, I could not find enough reference cases that would allow me to develop a notion about a sound distribution of available memory between the different cache types. I was somewhat naive and assumed that the RECYCLE pool should be far smaller than the DEFAULT pool, and that the size of the KEEP pool would mainly depend on how much spare memory is available without sacrificing the DEFAULT pool, which is not much in general. With 6 GB RAM and a 250 GB DB (sum of all file sizes) I am not suffering from lack of memory, but of course it is possible to break everything that you do wrong enough.

Regards
Rick Denoire Received on Sat Jan 01 2005 - 19:28:26 CST