Re: Index compression vs. table compression

From: Richard Foote <richard.foote_at_bigpond.nospam.com>
Date: Tue, 04 Jan 2005 10:52:33 GMT
Message-ID: <RzuCd.103215$K7.90520@news-server.bigpond.net.au>

Hi Rick

Some comments embedded.

"Rick Denoire" <100.17706_at_germanynet.de> wrote in message news:4p9et09oav479knok3n93oaq8i7monbm7d_at_4ax.com...
> "Richard Foote" <richard.foote_at_bigpond.nospam.com> wrote:
>
>>...discriminates against them. However blocks that are placed in the KEEP
>>pool
>>can remain cached with some assurance, *if* the size of the KEEP pool is
>>larger than the sum of the size of all KEEP objects. The danger with the
>>KEEP pool though is that you "keep" and allocate resources to an object
>>that
>>could be better utilised elsewhere.
>
> Your last sentence is worth a couple of comments.
>
> When I was assisting Oracle classes sometime ago and was learning
> about Oracle cache types, I was concerned by the way all other
> participants accepted the straight explanation and simple guidelines
> given for the KEEP cache unquestioned, because I couldn't. If Oracle
> implemented such an elaborate method to hold popular blocks as long as
> possible in the default cache, and if that method works, why should I
> try to circumvent the underlying logics and artificially force my own
> rules? Why should I decide by myself which tables/indexes should go
> into this special KEEP cache if the hot objects (or better hot blocks)
> would end up being kept in the default cache automatically anyway? The
> instructor (it was at Oracle's own training location) could not
> mention the *special* reasons I needed.

The reasoning behind the KEEP cache is simply to better control the caching characteristics of objects deemed important enough. Like I said, by sizing the KEEP cache larger than all KEEP objects, you can kinda guarantee the objects stay cached no matter the load, whereas objects in the DEFAULT pool *could* be aged out (perhaps by a specific load that puts undesired or unexpected stress on the DEFAULT pool although the manner in which new blocks are introduced and how the touch algorithm works, the truly hot hot blocks remain comparatively safe). So the KEEP objects remain safe and cuddly and warm. As the DBA, you have the call and control although like you say, the possibility of "abuse" is certainly there for those with limited understanding in how it all works.

>
> Any decision I would take choosing a supposedly hot table could be
> wrong in principle, would not discriminate between more and less
> popular blocks, and would stay statically in effect until the next big
> performance check, disregarding changing conditions. The Oracle
> method, on the other hand, would always yield best results by virtue
> of its mechanics, dynamically tracking block use; it would use
> available memory more effectively.

All true. So it comes down to understanding why you specifically want to keep an object cached, and ensuring that the object is referenced sufficiently to make the allocation of resources worthwhile. The benefit, guaranteeing that the said objects remain cached no matter "what else" might be happening.

>
> But then as I was forced to accept that using KEEP cache is a
> fundamental feature of performance tuning in the memory area, I
> "invented" my own reasons for the KEEP cache:
>
> 1. Oracle does not need to take care about the rank of blocks in the
> MRU, avoiding unnecessary CPU cycles and shortening the code length,
> when the blocks are already marked "privileged" by the DBA who puts
> them in the KEEP pool. If the blocks deserve this privilege anyway,
> then there is no sense for Oracle in trying to check them and track
> access frequency. That saves time.

The way each of the various buffer caches behave is *fundamentally the same*. What differs (hopefully) is the characteristics of how each object is referenced, specifically with regard to the likelihood of it being re-acquired in the buffer cache. If something is likely to be reused, ideally we want it to remain in memory. If something is unlikely to be reused, ideally we want to get rid of it from memory ASAP so it doesn't simply sit in memory wasting resources. The DEFAULT pool in most scenarios does a perfectly good job at ensuring this. For those that may want to refine things a tad, the KEEP and RECYCLE pools are there.

>
> 2. Execution time can be guaranteed to be constant and small enough
> for queries involving objects/blocks in the KEEP cache, because in
> some situations that is just necessary by mere decision. The DEFAULT
> cache would let certain otherwise hot blocks die out if, say, a whole
> staff of a specific department go on holiday; at their return to work,
> they will perhaps complain about vital processes being to slow, until
> the DEFAULT cache is repopulated according to the use. This unexpected
> "latency" could be unacceptable for the business.

Exactly, one possible scenario...

>
> 3. The internal memory organisation of the KEEP pool is implemented in
> a more effective way, based on the fact that its content is
> predictable, aligned and linear. There is no need to traverse B*trees
> or chained lists to find a block, since the KEEP pool essentially
> mirrors real storage without the overhead incurred by limitations in
> access to peripheral filesystems (*highly hypothetical*).

Not really, the LRU list in the KEEP pool works in fundamentally the same way as the DEFAULT pool. Put too much stuff in KEEP and blocks will be forced to age out just like in the DEFAULT pool. One specific difference between the pools though is that the CACHE/NOCACHE attribute for objects is ignored, objects are CACHE by definition regardless of the attribute.

>
> 4. There are some other *special* advantages from using the KEEP
> instead of the DEFAULT cache that I am not awareof , at the expense of
> memory waste, but at times it is as simple as deciding that you can
> afford to waste memory in order to get certain advantages

Not really except those already mentioned.

>
> Perhaps these arguments are not completely unreasonable, but please
> pay attention that they are my own INVENTION, since I had to convice
> myself about the use of the KEEP cache somehow. Of course, if you put
> some not too large tables suffering from FTS in the KEEP cache you
> will be very pleased by the execution of certain queries needing them.
> The disadvantage caused by a lot of other queries not being able to
> use cached blocks is diffuse and not so easy to quantify.
>
>>And Rick, just to clarify another error of Howard's, the purpose of the
>>RECYCLE pool is not necessarily to deal with FTSs. Large FTS are already
>>dealt with rather efficiently in the DEFAULT pool by placing blocks on the
>>least (cold) recently used end of the LRU list to be quickly reused. As
>>such, they cause minimal disruption.
>
> At this very moment I would like to go back and reread the whole stuff
> about the use of the RECYCLE by FTS operations... I understood that
> blocks read by FTS go to the RECYCLE cache automatically (if it is
> setup), but now I am not sure anymore. Should I set the RECYCLE
> attribute for tables explicitely, based on my assumption that they are
> going to be read via FTS?

No FTS do not go to the RECYCLE, they simply go to the pool the object is assigned to (except perhaps parallel FTS that aren't read into the buffer cache at all).

RECYCLE is usually "smaller" because we specifically want objects to age out and not sit around wasting resources. Remember, it's a mechanism to allow a more rapid aging of blocks unlikely to be reused, usually as a result of a random index access on large objects because typically they have 1/2 the LRU list to navigate before aging. However, not too small so that it can't cope with the load on the cache causing the various "buffer" waits and not too small that specific transactions that might themselves refer to a specific block various times within the transaction don't have time to do so before the block is aged out (and of course not too large else you waste memory and negate what you're trying to achieve with the RECYCLE pool in the first place).

>
> Assuming that Oracle uses the RECYCLE cache for FTS by default, could
> a block occupy a buffer in the RECYCLE pool (due to FTS) and some
> other buffer additionally in a different pool at the same time?
> (Surely not, just want to raise the question).

No, blocks do not go to the RECYCLE cache for FTS by default. Only when the buffer pool is changed for an object could an object have blocks in multiple pools as existing cached blocks remain in the current cache and only newly read blocks are loaded into the new cache. Howard and others might think that the recycle pool is designed for FTS but this generally isn't the case because the DEFAULT pool deals with them so efficiently (or processed in the PGA if read in parallel). "Small" FTS get cached regardless and get inserted into the top end of the "cool" half of the LRU list, "large" FTS get sent to the LRU end of the LRU list and (generally) get overwritten once unpinned and despite what Howard suggests, don't therefore "waste" resources.

>
> And if I set a table to use the RECYCLE cache, will its blocks get
> read into this pool even if they are accessed via indexes?

Yes. Like I said, RECYCLE pool object normally *should* be read via an index. Try and forget this notion that the RECYCLE pool is somehow designed specifically for FTS. It's not true.

>
>>Also remember a table that's read via
>>FTSs could also be read via an index by another process, assessing blocks
>>that could benefit from caching. However, if you have very large,
>>*randomly*
>>accessed tables (via index access), where the likelihood of two separate
>>processes requiring the same block is extremely minimal, then here we have
>>a
>>scenario where a block has been inserted in the middle of the LRU list and
>>will "waste" a memory buffer or two while it slowly migrates and floats
>>itself to the end of the LRU list to be finally reused. Multiply such
>>blocks
>>out by an (un)healthy sum, and we may prematurely push out other more
>>useful
>>blocks before their touch counts can increment sufficiently.
>
> Exactly! I have been considering how to reorganize memory of several
> DB in these days, and came to the conclusion that I can do my best
> based on estimations only. It seems very difficult to assess how large
> the relative portion of data is that is considered to be hot, and how
> hot it is compared to the rest. A high difference in "temperature"
> supports the use of a KEEP cache. A uniform distribution of block
> visits over a significant portion of data would make it very difficult
> to dimension the different pools properly.

The KEEP/RECYCLE pools are generally overrated and hyped up a bit too much IMHO. In most cases, they're simply not required. In specific examples, they can be useful in more fine-tuning how memory is assigned by objects and in controlling their caching characteristics. When used properly, they can marginally improve the performance and manageability of the buffer caches. When abused, they can significantly impede and hinder said performance.

>
> After some analysis I came to the conclusion that SYSTEM tables belong
> to the best candidates of objects to be cached. Is it a good practice
> to change pool attributes of SYSTEM objects?

If by SYSTEM, you actually might mean SYS, I wouldn't say so. I remember that there used to be a restriction that SYS objects had to live in the default pool, not sure if that has changed with current versions.

>
>>
>>For the RECYCLE pool therefore, the best candidates are large, randomly
>>access tables where blocks are unlikely to be reused by other processes,
>>*not* simply FTS tables as is often stated by Howard and others.
>
> Thanks for confirming my own hunble conclusions which I would not dare
> to say publicly in view of the overwhelming contrary opinions.
>
> I have realized how complex this matter can get and began to
> investigate the pertinent literature. To my surprise, I could not find
> enough reference cases that would allow me to develop a notion about a
> sound distribution of available memory between the different cache
> types. I was somewhat naive and assumed that the RECYCLE pool should
> be far smaller than the DEFAULT pool, and that the size of the KEEP
> pool would mainly depend on how much spare memory is available without
> sacrificing the DEFAULT pool, which is not much in general. With 6 GB
> RAM and a 250 GB DB (sum of all file sizes) I am not suffering from
> lack of memory, but of course it is possible to break everything that
> you do wrong enough.

It's not necessarily how much you've got but what you do with it :) I think your cautious approach and keenness to understand *what* and *why* you need to do something is encouraging. Like I said, in many cases, these other pools are unnecessary and users will experience no real difference in response times even after spending days putting this object into that pool. If you have a specific issue and you think either the KEEP or RECYCLE pool will help to address that issue, then great, that's the flexibility one has.

Cheers

Richard Received on Tue Jan 04 2005 - 04:52:33 CST