Re: reverse key index -- Sorting expected?

From: Vishnu Potukanuma <vishnupotukanuma_at_gmail.com>
Date: Mon, 6 Jan 2020 14:25:45 +0530
Message-ID: <CAP-Ryww3_osoafPRYhii2u+c8o_aXh=4B6bby4Eq3dAgZdLZ9w_at_mail.gmail.com>



Hi Andy,

Now that I look at the sentence I have written - it doesn't make sense.. regarding hash function... Let me rephrase things more clearly and precisely the way i am looking at...
I was referring to the functionality used in eliminating the duplicate values in the IN list ( )... as it makes the final transformed query... (i believe it is using hash function due to the following)...

example of hash function that kicks in during execution: select roll from randomload where roll in (select a.roll from temp a, temp b where a.roll = b.roll) // this case sorting doesn't happen - only hashing to eliminate duplicates...
select roll from randomload where roll in (select a.roll from temp) // both sorting and hashing to eliminate the duplicates .

db file sequential read event was basically to see the ordering of blocks read from extent to extent and based on the index treedump to see precisely whether it is reading from left most index leaf block to the right most index leaf block... now that looking at it from a different perspective.... even ordering the values in the ascending order doesn't make sense after a certain point as the probability or even potentially benefits of this diminishes as the number of entries in the leaf blocks reduces due to - 50-50 block splits or
- nature of data load (eg: orders from a particular store increase considerably during a certain time when they give offers)

- due to the size of the indexed column
- event multi column indexes and data type of the column (char or string)
- in which inherently less number of entries in leaf blocks - (also block
size dependent) -

the wired part of this is that it even sorts character data select * from students where name in ('Vishnu','VISHNU','VIshnu'); // doesn't really make sense...

Unless I have missed something completely - i cant see a valid purpose of sorting entries in the IN- List for a wide majority of the use cases (unless the entries lists in the IN clause are extremely close to each other) - which reduces the latch gets by only 1 even in this case - this raises another question - does this even sort if we use bind variables have to test this...

Thanks,
Vishnu

On Mon, Jan 6, 2020 at 1:24 PM Andy Sayer <andysayer_at_gmail.com> wrote:

> “ Is this a bug? from what I look at applying a hash function to
> eliminate the duplicates and sorting the results especially for reverse key
> indexes is basically an unnecessary step”
>
> What hash function are you referring to and how do the db file sequential
> read events indicate this is happening?
>
> How did you populate this students table? If the searched column was
> generated by a sequence with no gaps then one would expect the keys you’re
> looking for to be in different leaf blocks for the normal indexes so when
> you do the index traversal you might be more likely to require reading from
> disk more. The reverse key index might place the index keys closer together
> or further aspart depending on the implementation. Neither can really be
> proved just by looking at the physical block gets. Given the fetch in
> between sequential reads it looks like a different amount of data is
> actually being requested, does that mean that you used a differently
> populated table for the two tests?
>
> Additionally we cannot see which object your sequential reads are being
> reported against, it could be the table or the index. The trace data
> includes the object id so you can look it up for us.
>
> Reverse key indexes are usually related to increasing physical IO for
> standard operations because the working set you usually look at in a table
> are the few most recently inserted rows. For a reverse key index using a
> sequence to generate the value, these last few values are going to be
> scattered far around the index (which is why they’re usually naively used
> to help concurrency). They also restrict the use of range predicates
> because the values are not sorted in the same way that the index will be.
> What is your use case here? Or are you just experimenting?
>
> Thanks,
> Andrew
>
> On Mon, 6 Jan 2020 at 06:39, Vishnu Potukanuma <vishnupotukanuma_at_gmail.com>
> wrote:
>
>> Hi,
>>
>> consider the following case:
>> create table randomload(roll number, name varchar2(20), mark1 number);
>> populate the table with randomdata.
>> create index roll_desc_idx on randomload(roll desc);
>> or even ascending index...
>> roll is a unique key monotonically increasing.
>>
>> so if the maximum value is 1000000 - this value is stored in the left
>> most leaf block in the index btree structure.
>>
>> Now consider the following situation:
>> select * from students where student_id in (1,100000, 200000, 300000,
>> 400000);
>> we can use the above values in any order we want, here is precisely what
>> oracle is doing even before the final query transformation is done, it
>> eliminates the duplicate values and sorts the values in the ascending
>> order. the query after final transformation becomes..
>> select * from students where student_id =1 or student_id = 100000 or
>> student_id=200000 or student_id = 300000 or student_id = 400000;
>>
>> this works precisely as expected for all normal indexes and descending
>> indexes, since sorting them in ascending order as oracle can minimize the
>> IO if the values are adjacent to each other. but works horrible for reverse
>> key indexes.... as there is no order to which the oracle scans the leaf
>> blocks.
>>
>> I mean by default Oracle sorts these values in ascending order regardless
>> of what, for the descending index: there is no issue with the descending
>> indexes as well (but not in all the cases) ...
>> but for reverse key indexes?
>>
>> trace of index leaf block reads for a descending index...
>> WAIT #140268739421912: nam='db file sequential read' ela= 90 file#=7
>> block#=30099 blocks=1 obj#=74642 tim=5089544852
>> WAIT #140268739421912: nam='db file sequential read' ela= 95 file#=7
>> block#=29904 blocks=1 obj#=74642 tim=5089545009
>> WAIT #140268739421912: nam='db file sequential read' ela= 83 file#=7
>> block#=29667 blocks=1 obj#=74642 tim=5089545184
>> WAIT #140268739421912: nam='db file sequential read' ela= 101 file#=7
>> block#=29430 blocks=1 obj#=74642 tim=5089545354
>> WAIT #140268739421912: nam='db file scattered read' ela= 199 file#=7
>> block#=29424 blocks=6 obj#=74642 tim=5089545650
>> WAIT #140268739421912: nam='db file sequential read' ela= 86 file#=7
>> block#=29192 blocks=1 obj#=74642 tim=5089545807
>>
>>
>>
>> the following is the trace for reverse key index:
>> WAIT #140452225803960: nam='db file sequential read' ela= 215 file#=7
>> block#=17470 blocks=1 obj#=74643 tim=5897433740
>> WAIT #140452225803960: nam='db file sequential read' ela= 213 file#=7
>> block#=16909 blocks=1 obj#=74643 tim=5897434021
>> FETCH
>> #140452225803960:c=631,e=1033,p=10,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=3725354939,tim=5897434099
>> WAIT #140452225803960: nam='SQL*Net message from client' ela= 326 driver
>> id=1650815232 #bytes=1 p3=0 obj#=74643 tim=5897434480
>> WAIT #140452225803960: nam='db file sequential read' ela= 228 file#=7
>> block#=18162 blocks=1 obj#=74643 tim=5897434785
>> WAIT #140452225803960: nam='db file sequential read' ela= 221 file#=7
>> block#=18080 blocks=1 obj#=74643 tim=5897435085
>> WAIT #140452225803960: nam='SQL*Net message to client' ela= 1 driver
>> id=1650815232 #bytes=1 p3=0 obj#=74643 tim=5897435152
>> WAIT #140452225803960: nam='db file sequential read' ela= 214 file#=7
>> block#=19545 blocks=1 obj#=74643 tim=5897435399
>> WAIT #140452225803960: nam='db file sequential read' ela= 213 file#=7
>> block#=19383 blocks=1 obj#=74643 tim=5897435681
>> WAIT #140452225803960: nam='db file sequential read' ela= 209 file#=7
>> block#=20929 blocks=1 obj#=74643 tim=5897435960
>> WAIT #140452225803960: nam='db file sequential read' ela= 192 file#=7
>> block#=20686 blocks=1 obj#=74643 tim=5897436213
>> WAIT #140452225803960: nam='db file sequential read' ela= 183 file#=7
>> block#=22313 blocks=1 obj#=74643 tim=5897436471
>> WAIT #140452225803960: nam='db file sequential read' ela= 202 file#=7
>> block#=21989 blocks=1 obj#=74643 tim=5897436735
>> WAIT #140452225803960: nam='db file sequential read' ela= 223 file#=7
>> block#=23696 blocks=1 obj#=74643 tim=5897437039
>> WAIT #140452225803960: nam='db file sequential read' ela= 207 file#=7
>> block#=23292 blocks=1 obj#=74643 tim=5897437320
>>
>> Is this a bug? from what I look at applying a hash function to eliminate
>> the duplicates and sorting the results especially for reverse key indexes
>> is basically an unnecessary step....
>>
>> Thanks,
>> Vishnu
>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jan 06 2020 - 09:55:45 CET

Original text of this message