Re: Oracle's Myth: keep tables and indexes in separate tablespaces

From: Richard Foote <richard.foote_at_bigpond.com>
Date: Sat, 11 Oct 2003 13:45:20 GMT
Message-ID: <QPThb.146517$bo1.128474@news-server.bigpond.net.au>

"Volker Hetzer" <volker.hetzer_at_ieee.org> wrote in message news:bm6l3t$fdg$1_at_news.fujitsu-siemens.com...
>
> "Richard Foote" <richard.foote_at_bigpond.com> schrieb im Newsbeitrag
news:Wczhb.145191$bo1.144265_at_news-server.bigpond.net.au...
> > There is no doubt that Oracle itself is the biggest myth spreader of
them
> > all. Just look at it's own OTN site where you can still read why
> > indexes/tablespaces should be split for performance, that Oracle indexes
> > become unbalanced over time and need frequent rebuilding
> They don't become unbalanced???
> Could you tell me more about it?

Hi Volker,

This has been done to death recently but (deep breath) here goes ...

Oracle indexes are always balanced, always. By that I mean all leaf nodes are on the same level within the B-Tree structure, that's all leaf nodes, exactly the same level or "distance" from the root node. Always.

Indexes can become "skewed" or have blocks less densely populated in some parts of the index than in other parts. This is what some consider to be "unbalanced" and to which I give some the benefit of the doubt. However, skewed indexes deserving of an index rebuild are not common despite myths to the contrary.

Lets examine the text you quote carefully.

> Because the Documentation on my 9i CD says
> "Serializing within Indexes
> Use of sequences, or timestamps, to generate key values that are indexed
> themselves can lead to database hotspot problems, which affect response
time and
> throughput. This is usually the result of a monotonically growing key that
results in
> a right-growing index.

The above describes indexes in which the "right most" leaf node is always accessed during insert operations because, as the B-Tree is a logically ordered structure, it's this leaf node that contains the current maximum value. Hence it's the leaf node into which the new maximum value in monotonically growing keys will always want to insert it's index entry. This leads to a "hotspot" in high concurrent insert applications as everyone wants this poor leaf node leading to an excess of Buffer Busy Waits and related performance issues.

>To avoid this problem, try to generate keys that insert over
> the full range of the index.

And yes, a solution is to instead create a reverse key index that will spread logical reads across the entire index structure, thereby reducing the BBW. There are of course issues with this, not least the impact on index range scans but yes, so far so good. Note though we have not addressed or touched on the subject of "balanced" indexes yet.

> This results in a well-balanced index that is more
> scalable and space efficient.

Now the "bull-shit factor" rears it's ugly head. Firstly as mentioned previously, it does not effect the balance on the index one bit, nor does it necessarily effect the skewness of the index, nor does it necessarily make it more space efficient.

Two key issues (or Oracle myths)

Firstly, there is a misconception that by always inserting into the "right-most" leaf node, you somehow either make the index unbalanced (for those who have absolutely no idea how indexes work) or skewed (for those who only have a vague understanding how indexes work). It all comes down to what happens when the last leaf node gets full.

Those that have no idea think that the node "spawns" two new leaf nodes into which the index entries go and that the current index leaf node now becomes a branch node to the new leaf nodes. The height of the index thereby increases at this section of the index structure and we have an unbalanced index on our hands. This goes on and on and so the unbalanced index problem worsens.

Crap.

Those with only a vague idea think that the leaf block splits, that 50% of the entries remain in the current block and the maximum 50% of entries go into the new block. The associated branch block is then updated to reflect this new leaf node. This new block then fills, the 50-50 split happens again and so on. We now have an index that is trailing behind itself half empty blocks that never get filled because of the monotonically growing key and so we have a skewed index on our hands.

Crap.

Of course what happens is that when Oracle attempts to insert an entry into a node that would logically be the *maximum* index entry for the node a 100-0 split (or 99-1 or 90-10 depending on your terminology) occurs, instead of the "usual" 50-50 split, whereby only the new maximum entry goes into the new node. The "previous" last leaf node remains full resulting in no index skewness whatsoever as a result of the inserts. The associated branch node is then updated to reflect the new leaf node. If it fills, it then splits (in the same manner as the leaf nodes) and it's parent node has the new corresponding node entry added. If the "root" node were to fill in this manner (the root node really only being a branch node that sits "on top" of the index structure), then it splits into two branch nodes and an additional new root node is also allocated to point to the two branch nodes. We have now increased the height of the index BUT it has resulted in *all* leaf nodes now being one level further away from the root.

Our index is still fully balanced and it has no data skewness at all. None. It's as space efficient as can be.

Issues could only arise if we perform deletes. And that's only a could. I'll ignore the issue of updates which are logically index deletes anyway because updating the monotonically increasing index entries makes no sense. In my experience, row deletion is not so common nowadays, we always seem to want to keep collecting our data, storage is relatively cheap, etc.

However, if we were to delete rows from this table, then the deleted entries within the *index structure* can not be reused. That's obviously because all new entires are always inserted into the leaf node containing the maximum value. Note that deleted index space within the *index structure* generally can be reused, that's another Oracle myth I'll leave for now. This could result in data skewness but even here, that's generally not a problem.

Why ?

Because generally when we delete from these sorts of tables (for historical reasons, etc.) we delete whole batches of data (eg. all data from 5 years ago, etc.). And when we delete whole chunks of data, that data resides in the same index leaf nodes, it's monotonically increasing remember. So we are left with areas of our index with *fully* emptied index nodes. And these nodes *are* reusable, they get placed on the freelist and are reused when a split occurs at the insertion end of the index. The index logically "walks" with old emptied nodes on the "left" of the index structure being passed across to the "right".

The index remains both balanced and nonskewed automatically without a rebuild in sight.

*IF* data is deleted "sparsely" such that many nodes are not fully emptied, then we have a situation where data skewness could be an issue with monotonically increasing values, because unlike the usual scenario of reusing deleted entries, we have no chance of reusing this deleted space.

However, this is not a common scenario with most indexes.

>You can achieve this by using a reverse key index or
> using a cycling sequence to prefix and sequence values."

This may help contention and buffer busy wait scenarios but this will definitely not help to balance your index and it will probably not help to prevent data skewness either.

>
> So, presumably there are unbalanced indexes too.

No. Never.

In some rare cases you can have data skewness but these are exceptions, not the norm. Oracle's self balancing, 50-50 and 90-10 node splitting, node reuse and deleted space reuse mechanisms ensure that index rebuilds are very rare requirements.

And even then, index coalescing rather than index rebuilds are generally sufficient.

But there are always exceptions ...

> I mean, I can accept that there are all
> sorts of docs on the otn but I always assumed that the delivered doc
always has the final
> word.

This might be old advice but it's still good advice.

"Never Assume" !!

Cheers

Richard Received on Sat Oct 11 2003 - 08:45:20 CDT