Re: index rebuilding...

From: Richard Foote <richard.foote_at_bigpond.com>
Date: Sun, 2 Feb 2003 17:25:04 +1000
Message-ID: <2z2%9.38692$jM5.97931@newsfeeds.bigpond.com>

Hi again Don

Comments embedded (warning, a bit long in places ...)

"Don Burleson" <don_at_burleson.cc> wrote in message news:998d28f7.0302011414.1f9b5c40_at_posting.google.com...
> Hi Richard,
>
> At last, a polite response!
>
> Is it possible that the index algorithms have changed since I did that
> article in 1995? Maybe I am obsolete!

I'm not sure as I was only a baby in nappies back them ;) However, I'm pretty sure that at least version 7 of Oracle had a similar algorithm.

>
> My understanding of Oracle indexing was based on "borrowed" 7.3.3
> source code (donated by a disgruntled ex-Oracle employee), and the
> article that the arrogant, insulting, and self-aggrandizing Howard
> calls "rubbish" was published in back in 1995, when 7.3.4 was just
> released.
>
> At that time, (and I have the source code filed away somewhere) Oracle
> index nodes were organized by data blocks, where each data block was
> an index node. That is why you cannot specify PCTUSED for an index,
> because Oracle must govern the freelist re-link during splitting and
> spawning operations.

The reason why you can't specify a PCTUSED is that indexes behave quite differently from (conventional) tables in that if there's space in a node, Oracle will simply use it, if there isn't then Oracle simply splits the block. The concept of having a block *within the index structure* not being available for insert and having to await reaching a PCTUSED mark is therefore not applicable. In summary, if an index block is currently allocated to the index structure, it must by definition be off the freelist.

>
> Please correct me if any of the following is incorrect:
>
> Each data block within the index contains "nodes" in the index tree,
> with the bottom nodes (leaf blocks), containing pairs of symbolic keys
> and ROWID values. As an Oracle tree grows (via inserting rows into
> the table), Oracle fills the block, and when the block is full, it
> splits, creating new index nodes (data blocks) to manage the symbolic
> keys within the index. Hence, an Oracle index block may contain
> pointers to other index nodes or ROWID/Symbolic-key pairs.

This is true expect that a particular "node" is either a "leaf node" which means it contains an range of index entires along with their associated rowids or is a "branch node" which means it contains a set of range pointers that point to other branch nodes or leaf nodes within the index structure. These "pointers" are represented by a "less than" value from which the required path through the index structure can be derived. Note a particular block within the index structure can only be a leaf or a branch node.

Like I mentioned earlier, the best way to see all this is via a few block dumps. Following is an extract from a dump of a "branch" block. I don't want to necessarily describe the whole structure but the odd comments with a *** are mine:

Start dump data blocks tsn: 9 file#: 9 minblk 1420 maxblk 1420 buffer tsn: 9 rdba: 0x0240058c (9/1420)
scn: 0x0000.019233ab seq: 0x01 flg: 0x00 tail: 0x33ab0601 frmt: 0x02 chkval: 0x0000 type: 0x06=trans data Block header dump: 0x0240058c
Object id on Block? Y
seg/obj: 0x79a7 csc: 0x00.192339f itc: 1 flg: E typ: 2 - INDEX *** index object

     brn: 0  bdba: 0x2400589 ver: 0x01
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc

0x01 0xffff.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000.0192339f

Branch block dump *** Block dump highlights this block as a branch node

header address 49221708=0x2ef104c
kdxcolev 1
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y kdxconco 2
kdxcosdc 0
kdxconro 69
kdxcofbo 166=0xa6
kdxcofeo 6907=0x1afb
kdxcoavs 6741
kdxbrlmc 37750157=0x240058d
kdxbrsno 0
kdxbrbksz 8060
row#0[8043] dba: 37750158=0x240058e *** first branch pointer

col 0; len 5; (5):  4d 44 53 59 53                 *** go this way for
values less than this value (MDSYS)
col 1; len 6; (6):  02 40 04 ba 00 38            *** location of "pointed

to" node (in this case a leaf node)
row#1[8028] dba: 37750159=0x240058f *** second branch pointer, etc ... col 0; len 3; (3): 4f 44 4d
col 1; len 6; (6): 02 40 04 bf 00 2a
row#2[8007] dba: 37750160=0x2400590
col 0; len 9; (9): 4f 45 4d 5f 5a 49 47 47 59 col 1; len 6; (6): 02 40 04 ac 00 42
row#3[7986] dba: 37750161=0x2400591
col 0; len 9; (9): 4f 45 4d 5f 5a 49 47 47 59 col 1; len 6; (6): 02 40 04 c0 00 2d

And this is an extract of the first leaf node that is being pointed to by the above branch node...

Start dump data blocks tsn: 9 file#: 9 minblk 1421 maxblk 1421 buffer tsn: 9 rdba: 0x0240058d (9/1421)
scn: 0x0000.019233a1 seq: 0x01 flg: 0x04 tail: 0x33a10601 frmt: 0x02 chkval: 0x1090 type: 0x06=trans data Block header dump: 0x0240058d
Object id on Block? Y

 seg/obj: 0x79a7  csc: 0x00.192339f  itc: 2  flg: E  typ: 2 - INDEX
     brn: 0  bdba: 0x2400589 ver: 0x01
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc

0x01 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000
0x02 0xffff.000.00000000 0x00000000.0000.00 C--- 0 scn 0x0000.0192339f

above are the two (minimum) transaction slots

Leaf block dump *** OK, this is a leaf node ...

header address 49221732=0x2ef1064
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y kdxconco 2
kdxcosdc 0
kdxconro 413
kdxcofbo 862=0x35e
kdxcofeo 1680=0x690
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 37750158=0x240058e
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8021] flag: -----, lock: 0 *** first index entry header (currently unlocked)
col 0; len 5; (5): 42 4f 57 49 45 *** first index entry value (BOWIE)
col 1; len 6; (6): 02 40 20 60 00 29 *** rowid of first index entry row#1[8006] flag: -----, lock: 0 *** second index entry, etc .... col 0; len 5; (5): 42 4f 57 49 45
col 1; len 6; (6): 02 40 20 60 00 2a
row#2[7991] flag: -----, lock: 0
col 0; len 5; (5): 42 4f 57 49 45
col 1; len 6; (6): 02 40 20 60 00 2b
row#3[7976] flag: -----, lock: 0
col 0; len 5; (5): 42 4f 57 49 45
col 1; len 6; (6): 02 40 20 60 00 2c

When you delete an index entry, it looks like this :

SQL> delete insert_bowie where owner = 'BOWIE' and rownum = 1;

1 row deleted.

Execution Plan

0 DELETE STATEMENT Optimizer=CHOOSE 1 0 DELETE OF 'INSERT_BOWIE'

   2    1     COUNT (STOPKEY)
   3    2       TABLE ACCESS (BY INDEX ROWID) OF 'INSERT_BOWIE'
   4    3         INDEX (RANGE SCAN) OF 'INSERT_BOWIE_IDX' (NON-UNIQUE
          )

SQL> commit;

Commit complete.

Start dump data blocks tsn: 9 file#: 9 minblk 1421 maxblk 1421 buffer tsn: 9 rdba: 0x0240058d (9/1421)
scn: 0x0000.0192ad54 seq: 0x01 flg: 0x00 tail: 0xad540601 frmt: 0x02 chkval: 0x0000 type: 0x06=trans data Block header dump: 0x0240058d
Object id on Block? Y

 seg/obj: 0x79a7  csc: 0x00.192339f  itc: 2  flg: E  typ: 2 - INDEX
     brn: 0  bdba: 0x2400589 ver: 0x01
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc

0x01 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000
0x02 0x000a.01f.0000703f 0x008003ce.022c.2a ---- 1 fsc 0x0011.00000000

Note here that the second slot has/had a transaction (which happens to be the transaction that deleted an index entry)

Leaf block dump

header address 99618916=0x5f01064
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y kdxconco 2
kdxcosdc 0
kdxconro 413
kdxcofbo 862=0x35e
kdxcofeo 1680=0x690
kdxcoavs 818
kdxlespl 0
kdxlende 1
kdxlenxt 37750158=0x240058e
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8021] flag: ---D-, lock: 2 *** Note here that the first entry is/has been locked by the transaction in transaction slot number 2. Also note that the entry has been marked as deleted "D".

col 0; len 5; (5): 42 4f 57 49 45 *** This is hence now a "deleted row" entry
col 1; len 6; (6): 02 40 20 60 00 29
row#1[8006] flag: -----, lock: 0
col 0; len 5; (5): 42 4f 57 49 45
col 1; len 6; (6): 02 40 20 60 00 2a

Now to see how this deleted entry gets reused, one final dump (I promise )

SQL> insert into insert_bowie (owner) values ('AADVARK');

1 row created.

SQL> commit;

Commit complete.

Start dump data blocks tsn: 9 file#: 9 minblk 1421 maxblk 1421 buffer tsn: 9 rdba: 0x0240058d (9/1421)
scn: 0x0000.0192b7b8 seq: 0x01 flg: 0x02 tail: 0xb7b80601 frmt: 0x02 chkval: 0x0000 type: 0x06=trans data Block header dump: 0x0240058d
Object id on Block? Y

 seg/obj: 0x79a7  csc: 0x00.192b7b3  itc: 2  flg: E  typ: 2 - INDEX
     brn: 0  bdba: 0x2400589 ver: 0x01
     inc: 0  exflg: 0

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc

0x01 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc 0x0000.00000000
0x02 0x0005.022.0000668b 0x00800a3f.01ff.29 --U- 1 fsc 0x0000.0192b7b8

Again note, we have/had a transaction going on in slot 2

Leaf block dump

header address 99618916=0x5f01064
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y kdxconco 2
kdxcosdc 0
kdxconro 413
kdxcofbo 862=0x35e
kdxcofeo 1663=0x67f
kdxcoavs 816
kdxlespl 0
kdxlende 0
kdxlenxt 37750158=0x240058e
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[1663] flag: -----, lock: 2 *** this entry is/has been locked by a transaction in slot 2
col 0; len 7; (7): 41 41 44 56 41 52 4b *** we now have an index entry for the value (AADVARK)
col 1; len 6; (6): 02 40 20 61 00 24 *** a search through the whole dump will find that all trace of the previously deleted index has been erased.

row#1[8006] flag: -----, lock: 0
col 0; len 5; (5): 42 4f 57 49 45
col 1; len 6; (6): 02 40 20 60 00 2a

>
> The number of entries within each index data block is a function of
> two values:
>
> 1 - The length of the symbolic key
> 2 - The blocksize for the index tablespace
>

So the number of possible entries within an index block is a function of the total length of the index column(s) values (although note that this can now be compressed to be made more effiicient) and the size of the index block.

> Because the blocksize affects the number of keys within each index
> block, it follows that the blocksize will have an effect on the
> structure of the index tree. All else being equal, large 32k
> blocksizes will have more keys, resulting in a flatter index than the
> same index created in a 2k tablespace.

By having a larger block size, you can indeed fit more values into less blocks resulting in less leaf nodes. If you have less leaf nodes, you therefore don't need as many "pointers" to these nodes and so require less branch nodes. If you can reduce the total number of branch nodes required, you thereby reduce the size of the index structure sitting above the leaf nodes which *might* result in a lower index height. However, because you can fit so many entries into a particular branch node, and because so many branch nodes can be pointed to by the root or intermediate branch nodes, you generally have far fewer branch nodes allocated than the maximum capacity of the index structure and so an actual reduction in index height is actually quite rare.

>
> According to an article by Christopher Foot
> (www.dbazine.com/foot3.html):
>
> "A bigger block size means more space for key storage in the branch
> nodes of B-tree indexes, which reduces index height and improves the
> performance of indexed queries."

I can't comment on my almost name sake (don't you just hate it when people can't spell; fancy dropping off the last "e", it's so crucial to well balanced name ;) but I would have written the comment above as "a bigger block size means more space for key storage in the branch nodes of B-tree indexes, which *might in some particular situations* reduce index height *but nonetheless will improve the performance of indexed queries which require a "range scan operation" as fewer leaf nodes now need to be accessed, potentially reducing IO and associated overheads*". But then again, I'm not Christopher ...

>
> (Oh dear, Mr. Foot used the "node" word. I'll bet Howard will be
> really pissed . . . .)
>
> In any case, there appears to be evidence that block size affects the
> tree structure, which supports the argument that data blocks (yes
> "nodes"), affect the structure of the tree.

No argument there. Indexes love larger blocks sizes because they can fit more data into fewer nodes meaning a more efficient structure. Regardless of whether or not such savings represents a lower index, it will result in fewer leaf nodes making index range scans more efficient (which is a pretty popular way to use an index).

I hope this helps to clear a few things up.

Cheers

Richard Received on Sun Feb 02 2003 - 01:25:04 CST