Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: Index rebuilding

RE: Index rebuilding

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Sun, 14 Nov 2004 14:00:40 -0500
Message-ID: <KNEIIDHFLNJDHOOCFCDKEEFDFKAA.mwf@rsiz.com>


Well, actually, I believe there is clearly is no argument about the definition of a balanced B-tree. I think the guy who defined it in 1972, Rudolf Bayer, gets to keep his word.

"A balanced search tree in which every node has between m/2 and m
children, where m>1 is a fixed integer. m is the order. The root may have as few as 2 children. This is a good structure if much of the tree is in slow memory (disk), since the height, and hence the number of accesses, can be kept small, say one or two, by picking a large m."

See: http://www.nist.gov/dads/HTML/btree.html (National Institute of Standards and Technology) for the full information, or if you want the full glory of the ancient paper, try Rudolf Bayer and Edward M. McCreight, Organization and Maintenance of Large Ordered Indices, Acta Informatica, 1:173-189, 1972.

So balance, by official definition, includes both the number of levels from the root AND the distribution amongst the nodes. Clearly Oracle has implemented something a bit different. IF you do not define your terms, it is common usage to assume reference to discrepancy in height only. However, the m/2 rule has to do with the property of maintaining minimal height. Now Oracle, by implementing grow by root split, uniformly increases the number of levels to all levels. This indeed allows for divergence from the minimal possible height to a node or set of nodes, but it saves a heck of a lot of shuffling time.
(For the record, B*-trees are a variant of B-trees where only some component of the value required to navigate to the next layer is kept except for the final level "leaf nodes" where the full key and pointer(s) to data rows must be available.) Oracle also added in forward and backward leaf node pointers, which isn't strictly a feature of balanced B-trees, but I'm sure glad they did.

Anyway, by height Oracle B-tree indexes are always balanced by nature of implementation. By data distribution they have clearly decided it is better to defer all shuffling work except the most opportunistic cheap things such as noticing that a node is completely empty. Without the code, you'd have to engage in accidental discovery to find other cases where they might tune up the distribution on the fly. I always thought that one of the points of Richard's very fine paper was that Oracle made a very good choice in this regard. It is in tune with their general implementation philosophy that if you can defer work, defer it. That makes sense because there is a very good chance you'll never need to do the work at all. They diverge from that philosophy, where I've had the chance to talk to developers about the algorithms, only when doing the "cleanup" or deferable work is as cheap or nearly as cheap as not doing it.

So I'm completely in agreement that bastardization of carefully defined terms should be avoided.

Since there is a bias of usage toward only considering "balance" to mean number of levels, I believe I have personally always been careful to note when I refer to lack of balance as meaning bad distribution of keys in the leaf nodes or violation of the m/2 rule. In the late seventies there was actually a bit of a spat in the industry regarding whether you were allowed to call your indexing scheme a balanced B-tree. My recollection is that the rule was relaxed to m/4, and the maximum difference in levels traversed from root to leaf was settled on as something like 2 or 3 (where strictly it should be 1) but I can't find anything in print about that. Anyway, for a long time now we've called things B-trees that are a lot like B-trees but which don't follow the rules which even now still persist in the NIST definitions. Without the m/2 rule, by reductio absurdum, one row per leaf node qualifies as a balanced B-tree, and is in fact perfectly balanced. Even with the m/2 rule, if you choose 2 for m you get a binary tree. Clearly that is not what we want for optimal performance.

Maybe we should talk about Oracle index trees and unequal fullness of leaf nodes, but folks mostly refer to what Oracle has implemented as balanced B*trees.
Since they are implemented height balanced by artifact of spliting the root to grow in height, I guess folks that talk about an Oracle index being unbalanced are either talking about distribution of keys in the leaf nodes or they are ignorant, but I hope folks think about it a bit before they make an assumption which.

Regards,

mwf

-----Original Message-----
From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org]On Behalf Of Cary Millsap Sent: Sunday, November 14, 2004 11:46 AM To: oracle-l_at_freelists.org
Subject: RE: Index rebuilding

Doesn't it come down to making sure you've defined your terms? A lot of = the
argument seems to be an implicit disagreement over what the word =
"balanced"

means. In Knuth and other computer science texts that discuss indexes, I believe the definition of "balanced" is "an index is balanced iff (if = andSt
only if) all leaf nodes have the same distance to the root." By this definition, Oracle B*-tree indexes are ALWAYS balanced, and NEVER un-balanced. This point is not in contention, correct?

I think what's happening is that people who are complaining about un-balanced-ness are redefining the word "balance" to mean something completely different.

In general, I think it's sloppy to take change the meaning of a = scientific
word in a discussion or "white paper." When I say "scientific word," I = mean
one that has been carefully defined and used in a specific context = for--in
this case--decades. It's one of the things that drives me nuts about the Oracle culture, this bastardization of carefully defined, = well-established
terms for the convenience of some Oracle author who writes more than he reads. :)

I guess the problem is analogous to the one being solved in the XML = world by
the implementation of XML namespaces. Maybe instead of the term =
"balanced",

we should use the term "knuth:balanced" or "choose-an-author:balanced". = In
this case, I would suggest that the default namespace should be set to
"knuth".

Cary Millsap
Hotsos Enterprises, Ltd.
http://www.hotsos.com
* Nullius in verba *

Upcoming events:

- Performance Diagnosis 101: 1/4 Calgary, 2/2 Sydney
- SQL Optimization 101: 11/8 Dallas, 12/13 Atlanta, 2/7 Sydney
- Hotsos Symposium 2005: March 6-10 Dallas
- Visit www.hotsos.com for schedule details...


-----Original Message-----
From: oracle-l-bounce_at_freelists.org =
[mailto:oracle-l-bounce_at_freelists.org]
On Behalf Of Alex
Sent: Friday, November 12, 2004 5:44 PM
To: DGoulet_at_vicr.com; oraclel_at_weikop.com; oracle-l_at_freelists.org Subject: RE: Index rebuilding

I agree with Dick! Always and never are to be used in cases like "the sun always rises in the east: or "I've never enjoyed working with Oracle more than I do now" :)

Regards!

> Looked at Richard Foote's paper.  Don't know about
> that.  I did prove to
> OTS several years ago that a block could get "lost"
> in an index due to
> deletion/updates that left it empty.  I believe that
> got finally fixed
> in Oracle 8i.  I've still seen cases of index's
> becoming unbalanced, I
> know the docs day it's impossible, but it does
> happen without the index
> height increasing. And I still believe that index
> deletes don't get
> flushed so efficiently, as Richard suggests.  If
> that was the case then
> I can't explain why an index rebuild can cause an
> index to shrink by 30%
> or more.  And recent experience still shows that a
> rebuild can cause
> significant performance improvement.  And Oracle has
> provided the
> capability to rebuild indexes which is not trivial.=20
> Therefore, NEVER
> use the word "never" unless your absolutely certain
> that under all
> circumstances it will be absolutely true.  And in
> the current context,
> that is the truth, that is, never can never be an
> absolute.
>=20
> BTW: Since we've a few "myth busters" in the group.=20
> I appreciate the
> effort these people put into "myth busting", even if
> they are later
> proven to have erred.  At a very minimum they start
> discussion and
> re-examination of commonly held beliefs that can
> have changed or lost
> significance over the years(like it's best to have
> all of a tables data
> in the first extent).  Such discussion, although
> sometimes the start of
> "Holy Wars", is healthy (not the Holy War though)
> and a necessary part
> of all of us growing.  That being said, let it be
> noted that I agree to
> disagree, in part, with Mr Foote.
>=20
>=20
> Dick Goulet
> Senior Oracle DBA
> Oracle Certified 8i DBA
> -----Original Message-----
> From: Jared Still [mailto:jkstill_at_gmail.com]=3D20
> Sent: Friday, November 12, 2004 12:44 PM
> To: oraclel_at_weikop.com
> Cc: oracle-l_at_freelists.org; steve_at_trolltec.co.uk
> Subject: Re: Index rebuilding
>=20
> On Fri, 12 Nov 2004 11:49:46 +0100, Karsten Weikop
> <oraclel_at_weikop.com>
> wrote:
> > Please read the execellent paper from Richard
> Foote (which can be
> > downloaded from Miracle's site):
> >
>

http://www.miracleas.dk/images/upload/Docs/Richard%20Foote.pdf
> > Conclusion form this paper: Never Rebuild, but
> find the course to the
> > problem.
>=20
> Never?
>=20
> I think you will find that statement as difficult to
> support as
> 'always rebuild'.
>=20
> --=3D20
> Jared Still
> Certifiable Oracle DBA and Part Time Perl Evangelist
> --
> http://www.freelists.org/webpage/oracle-l
> --
> http://www.freelists.org/webpage/oracle-l
>=20



	=09
__________________________________=20

Do you Yahoo!?=20
Check out the new Yahoo! Front Page.=20
www.yahoo.com=20
=20
--
http://www.freelists.org/webpage/oracle-l

--
http://www.freelists.org/webpage/oracle-l


--
http://www.freelists.org/webpage/oracle-l
Received on Sun Nov 14 2004 - 12:59:54 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US