Re: reverse key index -- Sorting expected?

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Tue, 7 Jan 2020 11:01:19 +0000
Message-ID: <LNXP265MB15620F01A5570CA754778332A53F0_at_LNXP265MB1562.GBRP265.PROD.OUTLOOK.COM>


>> example of hash function that kicks in during execution:
>> select roll from randomload where roll in (select a.roll from temp a, temp b where a.roll = b.roll) // this case sorting doesn't happen - only hashing to eliminate duplicates...
>> select roll from randomload where roll in (select a.roll from temp) // both sorting and hashing to eliminate the duplicates .

The appearance of ant hashing in one of these examples is probably a pure costing decision based on the cardinality and clustering estimate - here's a plan, pulled from memory, that shows no hashing at all:

select object_name from t1 where object_id in (select object_id from t2)

Plan hash value: 2010303231



| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |


| 0 | SELECT STATEMENT | | | | 65 (100)| |
| 1 | NESTED LOOPS | | 62 | 2976 | 65 (2)| 00:00:01 |
| 2 | NESTED LOOPS | | 62 | 2976 | 65 (2)| 00:00:01 |
| 3 | SORT UNIQUE | | 62 | 248 | 2 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T2 | 62 | 248 | 2 (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | T1_I1 |     1 |       |     1   (0)| 00:00:01 |

| 6 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 44 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):


   5 - access("OBJECT_ID"="OBJECT_ID")

The optimizer has used a sort unique rather than a hash unique on T2 because it expects to find a small number of rows, and sorting the values for uniqueness may result in more efficient access for the nested loop join.

Bottom line - Oracle uses a sort to ensure uniqueness (and minimal iterations) for an IN-list; but for an in subquery it simply picks the best optimizer strategy for handling the join and subsequent activity. Here's another plan that doesn't even try for uniqueness (using the same two tables, reversing the roles), but switches to a semi-join:

select object_name from t2 where object_id in (select object_id from t1)

Plan hash value: 3107539380



| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |


| 0 | SELECT STATEMENT | | | | 25 (100)| |
|*  1 |  HASH JOIN SEMI       |       |    62 |  2790 |    25  (24)| 00:00:01 |

| 2 | TABLE ACCESS FULL | T2 | 62 | 2480 | 2 (0)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| T1_I1 | 56753 | 277K| 20 (15)| 00:00:01 |

Predicate Information (identified by operation id):


   1 - access("OBJECT_ID"="OBJECT_ID")

Regards
Jonathan Lewis

PS 19.3.0.0
create table t1 as select * from all_objects; create index t1_i1 on t1(object_id);
create table t2 as select * from t1 where mod(object_id,1000) = 0;



From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on behalf of Vishnu Potukanuma <vishnupotukanuma_at_gmail.com> Sent: 06 January 2020 08:55
To: Andy Sayer
Cc: Oracle-L Freelists
Subject: Re: reverse key index -- Sorting expected?

Hi Andy,

Now that I look at the sentence I have written - it doesn't make sense.. regarding hash function... Let me rephrase things more clearly and precisely the way i am looking at... I was referring to the functionality used in eliminating the duplicate values in the IN list ( )... as it makes the final transformed query... (i believe it is using hash function due to the following)...

example of hash function that kicks in during execution: select roll from randomload where roll in (select a.roll from temp a, temp b where a.roll = b.roll) // this case sorting doesn't happen - only hashing to eliminate duplicates... select roll from randomload where roll in (select a.roll from temp) // both sorting and hashing to eliminate the duplicates .

db file sequential read event was basically to see the ordering of blocks read from extent to extent and based on the index treedump to see precisely whether it is reading from left most index leaf block to the right most index leaf block... now that looking at it from a different perspective.... even ordering the values in the ascending order doesn't make sense after a certain point as the probability or even potentially benefits of this diminishes as the number of entries in the leaf blocks reduces due to

-  50-50 block splits or
-  nature of data load (eg: orders from a particular store increase considerably during a certain time when they give offers)
- due to the size of the indexed column
- event multi column indexes and data type of the column (char or string)
- in which inherently less number of entries in leaf blocks - (also block size dependent) -

the wired part of this is that it even sorts character data select * from students where name in ('Vishnu','VISHNU','VIshnu'); // doesn't really make sense...

Unless I have missed something completely - i cant see a valid purpose of sorting entries in the IN- List for a wide majority of the use cases (unless the entries lists in the IN clause are extremely close to each other) - which reduces the latch gets by only 1 even in this case - this raises another question - does this even sort if we use bind variables have to test this...

Thanks,
Vishnu

On Mon, Jan 6, 2020 at 1:24 PM Andy Sayer <andysayer_at_gmail.com<mailto:andysayer_at_gmail.com>> wrote: “ Is this a bug? from what I look at applying a hash function to eliminate the duplicates and sorting the results especially for reverse key indexes is basically an unnecessary step”

What hash function are you referring to and how do the db file sequential read events indicate this is happening?

How did you populate this students table? If the searched column was generated by a sequence with no gaps then one would expect the keys you’re looking for to be in different leaf blocks for the normal indexes so when you do the index traversal you might be more likely to require reading from disk more. The reverse key index might place the index keys closer together or further aspart depending on the implementation. Neither can really be proved just by looking at the physical block gets. Given the fetch in between sequential reads it looks like a different amount of data is actually being requested, does that mean that you used a differently populated table for the two tests?

Additionally we cannot see which object your sequential reads are being reported against, it could be the table or the index. The trace data includes the object id so you can look it up for us.

Reverse key indexes are usually related to increasing physical IO for standard operations because the working set you usually look at in a table are the few most recently inserted rows. For a reverse key index using a sequence to generate the value, these last few values are going to be scattered far around the index (which is why they’re usually naively used to help concurrency). They also restrict the use of range predicates because the values are not sorted in the same way that the index will be. What is your use case here? Or are you just experimenting?

Thanks,
Andrew

On Mon, 6 Jan 2020 at 06:39, Vishnu Potukanuma <vishnupotukanuma_at_gmail.com<mailto:vishnupotukanuma_at_gmail.com>> wrote: Hi,

consider the following case:
create table randomload(roll number, name varchar2(20), mark1 number); populate the table with randomdata.
create index roll_desc_idx on randomload(roll desc); or even ascending index...
roll is a unique key monotonically increasing.

so if the maximum value is 1000000 - this value is stored in the left most leaf block in the index btree structure.

Now consider the following situation:
select * from students where student_id in (1,100000, 200000, 300000, 400000); we can use the above values in any order we want, here is precisely what oracle is doing even before the final query transformation is done, it eliminates the duplicate values and sorts the values in the ascending order. the query after final transformation becomes.. select * from students where student_id =1 or student_id = 100000 or student_id=200000 or student_id = 300000 or student_id = 400000;

this works precisely as expected for all normal indexes and descending indexes, since sorting them in ascending order as oracle can minimize the IO if the values are adjacent to each other. but works horrible for reverse key indexes.... as there is no order to which the oracle scans the leaf blocks.

I mean by default Oracle sorts these values in ascending order regardless of what, for the descending index: there is no issue with the descending indexes as well (but not in all the cases) ... but for reverse key indexes?

trace of index leaf block reads for a descending index...

WAIT #140268739421912: nam='db file sequential read' ela= 90 file#=7 block#=30099 blocks=1 obj#=74642 tim=5089544852
WAIT #140268739421912: nam='db file sequential read' ela= 95 file#=7 block#=29904 blocks=1 obj#=74642 tim=5089545009
WAIT #140268739421912: nam='db file sequential read' ela= 83 file#=7 block#=29667 blocks=1 obj#=74642 tim=5089545184
WAIT #140268739421912: nam='db file sequential read' ela= 101 file#=7 block#=29430 blocks=1 obj#=74642 tim=5089545354
WAIT #140268739421912: nam='db file scattered read' ela= 199 file#=7 block#=29424 blocks=6 obj#=74642 tim=5089545650
WAIT #140268739421912: nam='db file sequential read' ela= 86 file#=7 block#=29192 blocks=1 obj#=74642 tim=5089545807



the following is the trace for reverse key index: WAIT #140452225803960: nam='db file sequential read' ela= 215 file#=7 block#=17470 blocks=1 obj#=74643 tim=5897433740 WAIT #140452225803960: nam='db file sequential read' ela= 213 file#=7 block#=16909 blocks=1 obj#=74643 tim=5897434021 FETCH #140452225803960:c=631,e=1033,p=10,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=3725354939,tim=5897434099

WAIT #140452225803960: nam='SQL*Net message from client' ela= 326 driver id=1650815232 #bytes=1 p3=0 obj#=74643 tim=5897434480
WAIT #140452225803960: nam='db file sequential read' ela= 228 file#=7 block#=18162 blocks=1 obj#=74643 tim=5897434785
WAIT #140452225803960: nam='db file sequential read' ela= 221 file#=7 block#=18080 blocks=1 obj#=74643 tim=5897435085
WAIT #140452225803960: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=74643 tim=5897435152
WAIT #140452225803960: nam='db file sequential read' ela= 214 file#=7 block#=19545 blocks=1 obj#=74643 tim=5897435399
WAIT #140452225803960: nam='db file sequential read' ela= 213 file#=7 block#=19383 blocks=1 obj#=74643 tim=5897435681
WAIT #140452225803960: nam='db file sequential read' ela= 209 file#=7 block#=20929 blocks=1 obj#=74643 tim=5897435960
WAIT #140452225803960: nam='db file sequential read' ela= 192 file#=7 block#=20686 blocks=1 obj#=74643 tim=5897436213
WAIT #140452225803960: nam='db file sequential read' ela= 183 file#=7 block#=22313 blocks=1 obj#=74643 tim=5897436471
WAIT #140452225803960: nam='db file sequential read' ela= 202 file#=7 block#=21989 blocks=1 obj#=74643 tim=5897436735
WAIT #140452225803960: nam='db file sequential read' ela= 223 file#=7 block#=23696 blocks=1 obj#=74643 tim=5897437039
WAIT #140452225803960: nam='db file sequential read' ela= 207 file#=7 block#=23292 blocks=1 obj#=74643 tim=5897437320

Is this a bug? from what I look at applying a hash function to eliminate the duplicates and sorting the results especially for reverse key indexes is basically an unnecessary step....

Thanks,
Vishnu

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jan 07 2020 - 12:01:19 CET

Original text of this message