Path: news.easynews.com!newsfeed1.easynews.com!easynews.com!easynews!newsfeed-east.nntpserver.com!nntpserver.com!news.sprintnetops.net!intgwlon.nntp.telstra.net!news.telstra.net!newsfeeds.bigpond.com!not-for-mail
From: "Richard Foote" <richard.foote@bigpond.com>
Newsgroups: comp.databases.oracle.server
References: <o%WZ9.61302$c41.1415158@news2.telusplanet.net> <v3ghe8qj8vstf4@corp.supernews.com> <3EXZ9.53855$GX4.2199677@news2.east.cox.net> <m%e_9.36861$jM5.93877@newsfeeds.bigpond.com> <3E39A86C.3143C919@exesolutions.com> <69f6c1c8.0301301921.143657e6@posting.google.com> <998d28f7.0301310721.3c724ec@posting.google.com> <jxy_9.37585$jM5.95701@newsfeeds.bigpond.com> <998d28f7.0301312045.7ca1d667@posting.google.com> <89Q_9.38296$jM5.97173@newsfeeds.bigpond.com> <998d28f7.0302011414.1f9b5c40@posting.google.com>
Subject: Re: index rebuilding...
Lines: 361
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2720.3000
X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2600.0000
Message-ID: <2z2%9.38692$jM5.97931@newsfeeds.bigpond.com>
Date: Sun, 2 Feb 2003 17:25:04 +1000
NNTP-Posting-Host: 203.54.191.155
X-Trace: newsfeeds.bigpond.com 1044165950 203.54.191.155 (Sun, 02 Feb 2003 17:05:50 EST)
NNTP-Posting-Date: Sun, 02 Feb 2003 17:05:50 EST
Organization: Telstra BigPond Internet Services (http://www.bigpond.com)
Xref: newsfeed1.easynews.com comp.databases.oracle.server:174487
X-Received-Date: Sat, 01 Feb 2003 23:15:30 MST (news.easynews.com)

Hi again Don

Comments embedded (warning, a bit long in places ...)

"Don Burleson" <don@burleson.cc> wrote in message
news:998d28f7.0302011414.1f9b5c40@posting.google.com...
> Hi Richard,
>
> At last, a polite response!
>
> Is it possible that the index algorithms have changed since I did that
> article in 1995?  Maybe I am obsolete!

I'm not sure as I was only a baby in nappies back them ;) However, I'm
pretty sure that at least version 7 of Oracle had a similar algorithm.

>
> My understanding of Oracle indexing was based on "borrowed" 7.3.3
> source code (donated by a disgruntled ex-Oracle employee), and the
> article that the arrogant, insulting, and self-aggrandizing Howard
> calls "rubbish" was published in back in 1995, when 7.3.4 was just
> released.
>
> At that time, (and I have the source code filed away somewhere) Oracle
> index nodes were organized by data blocks, where each data block was
> an index node.  That is why you cannot specify PCTUSED for an index,
> because Oracle must govern the freelist re-link during splitting and
> spawning operations.

The reason why you can't specify a PCTUSED is that indexes behave quite
differently from (conventional) tables in that if there's space in a node,
Oracle will simply use it, if there isn't then Oracle simply splits the
block. The concept of having a block *within the index structure* not being
available for insert and having to await reaching a PCTUSED mark is
therefore not applicable. In summary, if an index block is currently
allocated to the index structure, it must by definition be off the freelist.

>
> Please correct me if any of the following is incorrect:
>
> Each data block within the index contains "nodes" in the index tree,
> with the bottom nodes (leaf blocks), containing pairs of symbolic keys
> and ROWID values.  As an Oracle tree grows (via inserting rows into
> the table), Oracle fills the block, and when the block is full, it
> splits, creating new index nodes (data blocks) to manage the symbolic
> keys within the index.  Hence, an Oracle index block may contain
> pointers to other index nodes or ROWID/Symbolic-key pairs.

This is true expect that a particular "node" is either a "leaf node" which
means it contains an range of index entires along with their associated
rowids or is a "branch node" which means it contains a set of range pointers
that point to other branch nodes or leaf nodes within the index structure.
These "pointers" are represented by a "less than" value from which the
required path through the index structure can be derived. Note a particular
block within the index structure can only be a leaf or a branch node.

Like I mentioned earlier, the best way to see all this is via a few block
dumps. Following is an extract from a dump of a "branch" block. I don't want
to necessarily describe the whole structure but the odd comments with a ***
are mine:

Start dump data blocks tsn: 9 file#: 9 minblk 1420 maxblk 1420
buffer tsn: 9 rdba: 0x0240058c (9/1420)
scn: 0x0000.019233ab seq: 0x01 flg: 0x00 tail: 0x33ab0601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Block header dump:  0x0240058c
 Object id on Block? Y
 seg/obj: 0x79a7  csc: 0x00.192339f  itc: 1  flg: E  typ: 2 - INDEX  ***
index object
     brn: 0  bdba: 0x2400589 ver: 0x01
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn
0x0000.0192339f

Branch block dump     *** Block dump highlights this block as a branch node
=================
header address 49221708=0x2ef104c
kdxcolev 1
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 69
kdxcofbo 166=0xa6
kdxcofeo 6907=0x1afb
kdxcoavs 6741
kdxbrlmc 37750157=0x240058d
kdxbrsno 0
kdxbrbksz 8060
row#0[8043] dba: 37750158=0x240058e  *** first branch pointer
col 0; len 5; (5):  4d 44 53 59 53                 *** go this way for
values less than this value (MDSYS)
col 1; len 6; (6):  02 40 04 ba 00 38            *** location of "pointed
to" node (in this case a leaf node)
row#1[8028] dba: 37750159=0x240058f   *** second branch pointer, etc ...
col 0; len 3; (3):  4f 44 4d
col 1; len 6; (6):  02 40 04 bf 00 2a
row#2[8007] dba: 37750160=0x2400590
col 0; len 9; (9):  4f 45 4d 5f 5a 49 47 47 59
col 1; len 6; (6):  02 40 04 ac 00 42
row#3[7986] dba: 37750161=0x2400591
col 0; len 9; (9):  4f 45 4d 5f 5a 49 47 47 59
col 1; len 6; (6):  02 40 04 c0 00 2d


And this is an extract of the first leaf node that is being pointed to by
the above branch node...


Start dump data blocks tsn: 9 file#: 9 minblk 1421 maxblk 1421
buffer tsn: 9 rdba: 0x0240058d (9/1421)
scn: 0x0000.019233a1 seq: 0x01 flg: 0x04 tail: 0x33a10601
frmt: 0x02 chkval: 0x1090 type: 0x06=trans data
Block header dump:  0x0240058d
 Object id on Block? Y
 seg/obj: 0x79a7  csc: 0x00.192339f  itc: 2  flg: E  typ: 2 - INDEX
     brn: 0  bdba: 0x2400589 ver: 0x01
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc
0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn
0x0000.0192339f

*** above are the two (minimum) transaction slots

Leaf block dump         *** OK, this is a leaf node ...
===============
header address 49221732=0x2ef1064
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 413
kdxcofbo 862=0x35e
kdxcofeo 1680=0x690
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 37750158=0x240058e
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8021] flag: -----, lock: 0          *** first index entry header
(currently unlocked)
col 0; len 5; (5):  42 4f 57 49 45         *** first index entry value
(BOWIE)
col 1; len 6; (6):  02 40 20 60 00 29   *** rowid of first index entry
row#1[8006] flag: -----, lock: 0          *** second index entry, etc ....
col 0; len 5; (5):  42 4f 57 49 45
col 1; len 6; (6):  02 40 20 60 00 2a
row#2[7991] flag: -----, lock: 0
col 0; len 5; (5):  42 4f 57 49 45
col 1; len 6; (6):  02 40 20 60 00 2b
row#3[7976] flag: -----, lock: 0
col 0; len 5; (5):  42 4f 57 49 45
col 1; len 6; (6):  02 40 20 60 00 2c

When you delete an index entry, it looks like this :

SQL> delete insert_bowie where owner = 'BOWIE' and rownum = 1;

1 row deleted.


Execution Plan
----------------------------------------------------------
   0      DELETE STATEMENT Optimizer=CHOOSE
   1    0   DELETE OF 'INSERT_BOWIE'
   2    1     COUNT (STOPKEY)
   3    2       TABLE ACCESS (BY INDEX ROWID) OF 'INSERT_BOWIE'
   4    3         INDEX (RANGE SCAN) OF 'INSERT_BOWIE_IDX' (NON-UNIQUE
          )

SQL> commit;

Commit complete.

Start dump data blocks tsn: 9 file#: 9 minblk 1421 maxblk 1421
buffer tsn: 9 rdba: 0x0240058d (9/1421)
scn: 0x0000.0192ad54 seq: 0x01 flg: 0x00 tail: 0xad540601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Block header dump:  0x0240058d
 Object id on Block? Y
 seg/obj: 0x79a7  csc: 0x00.192339f  itc: 2  flg: E  typ: 2 - INDEX
     brn: 0  bdba: 0x2400589 ver: 0x01
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc
0x0000.00000000
0x02   0x000a.01f.0000703f  0x008003ce.022c.2a  ----    1  fsc
0x0011.00000000

*** Note here that the second slot has/had a transaction (which happens to
be the transaction that deleted an index entry)

Leaf block dump
===============
header address 99618916=0x5f01064
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 413
kdxcofbo 862=0x35e
kdxcofeo 1680=0x690
kdxcoavs 818
kdxlespl 0
kdxlende 1
kdxlenxt 37750158=0x240058e
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8021] flag: ---D-, lock: 2   *** Note here that the first entry is/has
been locked by the transaction in transaction slot number 2. Also note that
the entry has been marked as deleted "D".

col 0; len 5; (5):  42 4f 57 49 45         *** This is hence now a "deleted
row" entry
col 1; len 6; (6):  02 40 20 60 00 29
row#1[8006] flag: -----, lock: 0
col 0; len 5; (5):  42 4f 57 49 45
col 1; len 6; (6):  02 40 20 60 00 2a

Now to see how this deleted entry gets reused, one final dump (I promise )

SQL> insert into insert_bowie (owner) values ('AADVARK');

1 row created.

SQL> commit;

Commit complete.

Start dump data blocks tsn: 9 file#: 9 minblk 1421 maxblk 1421
buffer tsn: 9 rdba: 0x0240058d (9/1421)
scn: 0x0000.0192b7b8 seq: 0x01 flg: 0x02 tail: 0xb7b80601
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Block header dump:  0x0240058d
 Object id on Block? Y
 seg/obj: 0x79a7  csc: 0x00.192b7b3  itc: 2  flg: E  typ: 2 - INDEX
     brn: 0  bdba: 0x2400589 ver: 0x01
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc
0x0000.00000000
0x02   0x0005.022.0000668b  0x00800a3f.01ff.29  --U-    1  fsc
0x0000.0192b7b8

*** Again note, we have/had a transaction going on in slot 2

Leaf block dump
===============
header address 99618916=0x5f01064
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 413
kdxcofbo 862=0x35e
kdxcofeo 1663=0x67f
kdxcoavs 816
kdxlespl 0
kdxlende 0
kdxlenxt 37750158=0x240058e
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[1663] flag: -----, lock: 2             *** this entry is/has been
locked by a transaction in slot 2
col 0; len 7; (7):  41 41 44 56 41 52 4b *** we now have an index entry for
the value (AADVARK)
col 1; len 6; (6):  02 40 20 61 00 24      *** a search through the whole
dump will find that all trace of the previously deleted index has been
erased.

row#1[8006] flag: -----, lock: 0
col 0; len 5; (5):  42 4f 57 49 45
col 1; len 6; (6):  02 40 20 60 00 2a

>
> The number of entries within each index data block is a function of
> two values:
>
> 1 - The length of the symbolic key
> 2 - The blocksize for the index tablespace
>

So the number of possible entries within an index block is a function of the
total length of the index column(s) values (although note that this can now
be compressed to be made more effiicient) and the size of the index block.

> Because the blocksize affects the number of keys within each index
> block, it follows that the blocksize will have an effect on the
> structure of the index tree.  All else being equal, large 32k
> blocksizes will have more keys, resulting in a flatter index than the
> same index created in a 2k tablespace.

By having a larger block size, you can indeed fit more values into less
blocks resulting in less leaf nodes. If you have less leaf nodes, you
therefore don't need as many "pointers" to these nodes and so require less
branch nodes. If you can reduce the total number of branch nodes required,
you thereby reduce the size of the index structure sitting above the leaf
nodes which *might* result in a lower index height. However, because you can
fit so many entries into a particular branch node, and because so many
branch nodes can be pointed to by the root or intermediate branch nodes, you
generally have far fewer branch nodes allocated than the maximum capacity of
the index structure and so an actual reduction in index height is actually
quite rare.

>
> According to an article by Christopher Foot
> (www.dbazine.com/foot3.html):
>
> "A bigger block size means more space for key storage in the branch
> nodes of B-tree indexes, which reduces index height and improves the
> performance of indexed queries."

I can't comment on my almost name sake (don't you just hate it when people
can't spell; fancy dropping off the last "e", it's so crucial to well
balanced name ;) but I would have written the comment above as "a bigger
block size means more space for key storage in the branch nodes of B-tree
indexes, which *might in some particular situations* reduce index height
*but nonetheless will improve the performance of indexed queries which
require a "range scan operation" as fewer leaf nodes now need to be
accessed, potentially reducing IO and associated overheads*".  But then
again, I'm not Christopher ...

>
> (Oh dear, Mr. Foot used the "node" word.  I'll bet Howard will be
> really pissed . . . .)
>
> In any case, there appears to be evidence that block size affects the
> tree structure, which supports the argument that data blocks (yes
> "nodes"), affect the structure of the tree.

No argument there. Indexes love larger blocks sizes because they can fit
more data into fewer nodes meaning a more efficient structure. Regardless of
whether or not such savings represents a lower index, it will result in
fewer leaf nodes making index range scans more efficient (which is a pretty
popular way to use an index).

I hope this helps to clear a few things up.

Cheers

Richard


