Re: index rebuilding...

From: sk <sk_at_nospan-noway-nohow.org>
Date: Sun, 02 Feb 2003 20:18:08 GMT
Message-ID: <3E3D7D60.22A4B5A@nospan-noway-nohow.org>

Wow, isn't it great when to big dogs come out to play. Very educational also.

Thanks to all.

Richard Foote wrote:

> Hi again Don
>
> Comments embedded (warning, a bit long in places ...)
>
> "Don Burleson" <don_at_burleson.cc> wrote in message
> news:998d28f7.0302011414.1f9b5c40_at_posting.google.com...
> > Hi Richard,
> >
> > At last, a polite response!
> >
> > Is it possible that the index algorithms have changed since I did that
> > article in 1995? Maybe I am obsolete!
>
> I'm not sure as I was only a baby in nappies back them ;) However, I'm
> pretty sure that at least version 7 of Oracle had a similar algorithm.
>
> >
> > My understanding of Oracle indexing was based on "borrowed" 7.3.3
> > source code (donated by a disgruntled ex-Oracle employee), and the
> > article that the arrogant, insulting, and self-aggrandizing Howard
> > calls "rubbish" was published in back in 1995, when 7.3.4 was just
> > released.
> >
> > At that time, (and I have the source code filed away somewhere) Oracle
> > index nodes were organized by data blocks, where each data block was
> > an index node. That is why you cannot specify PCTUSED for an index,
> > because Oracle must govern the freelist re-link during splitting and
> > spawning operations.
>
> The reason why you can't specify a PCTUSED is that indexes behave quite
> differently from (conventional) tables in that if there's space in a node,
> Oracle will simply use it, if there isn't then Oracle simply splits the
> block. The concept of having a block *within the index structure* not being
> available for insert and having to await reaching a PCTUSED mark is
> therefore not applicable. In summary, if an index block is currently
> allocated to the index structure, it must by definition be off the freelist.
>
> >
> > Please correct me if any of the following is incorrect:
> >
> > Each data block within the index contains "nodes" in the index tree,
> > with the bottom nodes (leaf blocks), containing pairs of symbolic keys
> > and ROWID values. As an Oracle tree grows (via inserting rows into
> > the table), Oracle fills the block, and when the block is full, it
> > splits, creating new index nodes (data blocks) to manage the symbolic
> > keys within the index. Hence, an Oracle index block may contain
> > pointers to other index nodes or ROWID/Symbolic-key pairs.
>
> This is true expect that a particular "node" is either a "leaf node" which
> means it contains an range of index entires along with their associated
> rowids or is a "branch node" which means it contains a set of range pointers
> that point to other branch nodes or leaf nodes within the index structure.
> These "pointers" are represented by a "less than" value from which the
> required path through the index structure can be derived. Note a particular
> block within the index structure can only be a leaf or a branch node.
>
> Like I mentioned earlier, the best way to see all this is via a few block
> dumps. Following is an extract from a dump of a "branch" block. I don't want
> to necessarily describe the whole structure but the odd comments with a ***
> are mine:
>
> Start dump data blocks tsn: 9 file#: 9 minblk 1420 maxblk 1420
> buffer tsn: 9 rdba: 0x0240058c (9/1420)
> scn: 0x0000.019233ab seq: 0x01 flg: 0x00 tail: 0x33ab0601
> frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
> Block header dump: 0x0240058c
> Object id on Block? Y
> seg/obj: 0x79a7 csc: 0x00.192339f itc: 1 flg: E typ: 2 - INDEX ***
> index object
> brn: 0 bdba: 0x2400589 ver: 0x01
> inc: 0 exflg: 0
>
> Itl Xid Uba Flag Lck Scn/Fsc
> 0x01 0xffff.000.00000000 0x00000000.0000.00 C--- 0 scn
> 0x0000.0192339f
>
> Branch block dump *** Block dump highlights this block as a branch node
> =================
> header address 49221708=0x2ef104c
> kdxcolev 1
> KDXCOLEV Flags = - - -
> kdxcolok 0
> kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
> kdxconco 2
> kdxcosdc 0
> kdxconro 69
> kdxcofbo 166=0xa6
> kdxcofeo 6907=0x1afb
> kdxcoavs 6741
> kdxbrlmc 37750157=0x240058d
> kdxbrsno 0
> kdxbrbksz 8060
> row#0[8043] dba: 37750158=0x240058e *** first branch pointer
> col 0; len 5; (5): 4d 44 53 59 53 *** go this way for
> values less than this value (MDSYS)
> col 1; len 6; (6): 02 40 04 ba 00 38 *** location of "pointed
> to" node (in this case a leaf node)
> row#1[8028] dba: 37750159=0x240058f *** second branch pointer, etc ...
> col 0; len 3; (3): 4f 44 4d
> col 1; len 6; (6): 02 40 04 bf 00 2a
> row#2[8007] dba: 37750160=0x2400590
> col 0; len 9; (9): 4f 45 4d 5f 5a 49 47 47 59
> col 1; len 6; (6): 02 40 04 ac 00 42
> row#3[7986] dba: 37750161=0x2400591
> col 0; len 9; (9): 4f 45 4d 5f 5a 49 47 47 59
> col 1; len 6; (6): 02 40 04 c0 00 2d
>
> And this is an extract of the first leaf node that is being pointed to by
> the above branch node...
>
> Start dump data blocks tsn: 9 file#: 9 minblk 1421 maxblk 1421
> buffer tsn: 9 rdba: 0x0240058d (9/1421)
> scn: 0x0000.019233a1 seq: 0x01 flg: 0x04 tail: 0x33a10601
> frmt: 0x02 chkval: 0x1090 type: 0x06=trans data
> Block header dump: 0x0240058d
> Object id on Block? Y
> seg/obj: 0x79a7 csc: 0x00.192339f itc: 2 flg: E typ: 2 - INDEX
> brn: 0 bdba: 0x2400589 ver: 0x01
> inc: 0 exflg: 0
>
> Itl Xid Uba Flag Lck Scn/Fsc
> 0x01 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc
> 0x0000.00000000
> 0x02 0xffff.000.00000000 0x00000000.0000.00 C--- 0 scn
> 0x0000.0192339f
>
> *** above are the two (minimum) transaction slots
>
> Leaf block dump *** OK, this is a leaf node ...
> ===============
> header address 49221732=0x2ef1064
> kdxcolev 0
> KDXCOLEV Flags = - - -
> kdxcolok 0
> kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
> kdxconco 2
> kdxcosdc 0
> kdxconro 413
> kdxcofbo 862=0x35e
> kdxcofeo 1680=0x690
> kdxcoavs 818
> kdxlespl 0
> kdxlende 0
> kdxlenxt 37750158=0x240058e
> kdxleprv 0=0x0
> kdxledsz 0
> kdxlebksz 8036
> row#0[8021] flag: -----, lock: 0 *** first index entry header
> (currently unlocked)
> col 0; len 5; (5): 42 4f 57 49 45 *** first index entry value
> (BOWIE)
> col 1; len 6; (6): 02 40 20 60 00 29 *** rowid of first index entry
> row#1[8006] flag: -----, lock: 0 *** second index entry, etc ....
> col 0; len 5; (5): 42 4f 57 49 45
> col 1; len 6; (6): 02 40 20 60 00 2a
> row#2[7991] flag: -----, lock: 0
> col 0; len 5; (5): 42 4f 57 49 45
> col 1; len 6; (6): 02 40 20 60 00 2b
> row#3[7976] flag: -----, lock: 0
> col 0; len 5; (5): 42 4f 57 49 45
> col 1; len 6; (6): 02 40 20 60 00 2c
>
> When you delete an index entry, it looks like this :
>
> SQL> delete insert_bowie where owner = 'BOWIE' and rownum = 1;
>
> 1 row deleted.
>
> Execution Plan
> ----------------------------------------------------------
> 0 DELETE STATEMENT Optimizer=CHOOSE
> 1 0 DELETE OF 'INSERT_BOWIE'
> 2 1 COUNT (STOPKEY)
> 3 2 TABLE ACCESS (BY INDEX ROWID) OF 'INSERT_BOWIE'
> 4 3 INDEX (RANGE SCAN) OF 'INSERT_BOWIE_IDX' (NON-UNIQUE
> )
>
> SQL> commit;
>
> Commit complete.
>
> Start dump data blocks tsn: 9 file#: 9 minblk 1421 maxblk 1421
> buffer tsn: 9 rdba: 0x0240058d (9/1421)
> scn: 0x0000.0192ad54 seq: 0x01 flg: 0x00 tail: 0xad540601
> frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
> Block header dump: 0x0240058d
> Object id on Block? Y
> seg/obj: 0x79a7 csc: 0x00.192339f itc: 2 flg: E typ: 2 - INDEX
> brn: 0 bdba: 0x2400589 ver: 0x01
> inc: 0 exflg: 0
>
> Itl Xid Uba Flag Lck Scn/Fsc
> 0x01 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc
> 0x0000.00000000
> 0x02 0x000a.01f.0000703f 0x008003ce.022c.2a ---- 1 fsc
> 0x0011.00000000
>
> *** Note here that the second slot has/had a transaction (which happens to
> be the transaction that deleted an index entry)
>
> Leaf block dump
> ===============
> header address 99618916=0x5f01064
> kdxcolev 0
> KDXCOLEV Flags = - - -
> kdxcolok 0
> kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
> kdxconco 2
> kdxcosdc 0
> kdxconro 413
> kdxcofbo 862=0x35e
> kdxcofeo 1680=0x690
> kdxcoavs 818
> kdxlespl 0
> kdxlende 1
> kdxlenxt 37750158=0x240058e
> kdxleprv 0=0x0
> kdxledsz 0
> kdxlebksz 8036
> row#0[8021] flag: ---D-, lock: 2 *** Note here that the first entry is/has
> been locked by the transaction in transaction slot number 2. Also note that
> the entry has been marked as deleted "D".
>
> col 0; len 5; (5): 42 4f 57 49 45 *** This is hence now a "deleted
> row" entry
> col 1; len 6; (6): 02 40 20 60 00 29
> row#1[8006] flag: -----, lock: 0
> col 0; len 5; (5): 42 4f 57 49 45
> col 1; len 6; (6): 02 40 20 60 00 2a
>
> Now to see how this deleted entry gets reused, one final dump (I promise )
>
> SQL> insert into insert_bowie (owner) values ('AADVARK');
>
> 1 row created.
>
> SQL> commit;
>
> Commit complete.
>
> Start dump data blocks tsn: 9 file#: 9 minblk 1421 maxblk 1421
> buffer tsn: 9 rdba: 0x0240058d (9/1421)
> scn: 0x0000.0192b7b8 seq: 0x01 flg: 0x02 tail: 0xb7b80601
> frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
> Block header dump: 0x0240058d
> Object id on Block? Y
> seg/obj: 0x79a7 csc: 0x00.192b7b3 itc: 2 flg: E typ: 2 - INDEX
> brn: 0 bdba: 0x2400589 ver: 0x01
> inc: 0 exflg: 0
>
> Itl Xid Uba Flag Lck Scn/Fsc
> 0x01 0x0000.000.00000000 0x00000000.0000.00 ---- 0 fsc
> 0x0000.00000000
> 0x02 0x0005.022.0000668b 0x00800a3f.01ff.29 --U- 1 fsc
> 0x0000.0192b7b8
>
> *** Again note, we have/had a transaction going on in slot 2
>
> Leaf block dump
> ===============
> header address 99618916=0x5f01064
> kdxcolev 0
> KDXCOLEV Flags = - - -
> kdxcolok 0
> kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
> kdxconco 2
> kdxcosdc 0
> kdxconro 413
> kdxcofbo 862=0x35e
> kdxcofeo 1663=0x67f
> kdxcoavs 816
> kdxlespl 0
> kdxlende 0
> kdxlenxt 37750158=0x240058e
> kdxleprv 0=0x0
> kdxledsz 0
> kdxlebksz 8036
> row#0[1663] flag: -----, lock: 2 *** this entry is/has been
> locked by a transaction in slot 2
> col 0; len 7; (7): 41 41 44 56 41 52 4b *** we now have an index entry for
> the value (AADVARK)
> col 1; len 6; (6): 02 40 20 61 00 24 *** a search through the whole
> dump will find that all trace of the previously deleted index has been
> erased.
>
> row#1[8006] flag: -----, lock: 0
> col 0; len 5; (5): 42 4f 57 49 45
> col 1; len 6; (6): 02 40 20 60 00 2a
>
> >
> > The number of entries within each index data block is a function of
> > two values:
> >
> > 1 - The length of the symbolic key
> > 2 - The blocksize for the index tablespace
> >
>
> So the number of possible entries within an index block is a function of the
> total length of the index column(s) values (although note that this can now
> be compressed to be made more effiicient) and the size of the index block.
>
> > Because the blocksize affects the number of keys within each index
> > block, it follows that the blocksize will have an effect on the
> > structure of the index tree. All else being equal, large 32k
> > blocksizes will have more keys, resulting in a flatter index than the
> > same index created in a 2k tablespace.
>
> By having a larger block size, you can indeed fit more values into less
> blocks resulting in less leaf nodes. If you have less leaf nodes, you
> therefore don't need as many "pointers" to these nodes and so require less
> branch nodes. If you can reduce the total number of branch nodes required,
> you thereby reduce the size of the index structure sitting above the leaf
> nodes which *might* result in a lower index height. However, because you can
> fit so many entries into a particular branch node, and because so many
> branch nodes can be pointed to by the root or intermediate branch nodes, you
> generally have far fewer branch nodes allocated than the maximum capacity of
> the index structure and so an actual reduction in index height is actually
> quite rare.
>
> >
> > According to an article by Christopher Foot
> > (www.dbazine.com/foot3.html):
> >
> > "A bigger block size means more space for key storage in the branch
> > nodes of B-tree indexes, which reduces index height and improves the
> > performance of indexed queries."
>
> I can't comment on my almost name sake (don't you just hate it when people
> can't spell; fancy dropping off the last "e", it's so crucial to well
> balanced name ;) but I would have written the comment above as "a bigger
> block size means more space for key storage in the branch nodes of B-tree
> indexes, which *might in some particular situations* reduce index height
> *but nonetheless will improve the performance of indexed queries which
> require a "range scan operation" as fewer leaf nodes now need to be
> accessed, potentially reducing IO and associated overheads*". But then
> again, I'm not Christopher ...
>
> >
> > (Oh dear, Mr. Foot used the "node" word. I'll bet Howard will be
> > really pissed . . . .)
> >
> > In any case, there appears to be evidence that block size affects the
> > tree structure, which supports the argument that data blocks (yes
> > "nodes"), affect the structure of the tree.
>
> No argument there. Indexes love larger blocks sizes because they can fit
> more data into fewer nodes meaning a more efficient structure. Regardless of
> whether or not such savings represents a lower index, it will result in
> fewer leaf nodes making index range scans more efficient (which is a pretty
> popular way to use an index).
>
> I hope this helps to clear a few things up.
>
> Cheers
>
> Richard
Received on Sun Feb 02 2003 - 14:18:08 CST