Re: reverse key index -- Sorting expected?

From: Vishnu Potukanuma <vishnupotukanuma_at_gmail.com>
Date: Wed, 8 Jan 2020 19:14:59 +0530
Message-ID: <CAP-RywzTxt92ACWZ8A4BqekW3=29bCbDZJwz_haXy34+nNwR0Q_at_mail.gmail.com>



I agree... these are really very weird scenarios... please don't take these questions in other way.... I have seen things with regard to a particular aspect of this... I have never talked/mentioned about the elapsed times...

do these impact the elapsed times of the SQL statements ---> Big Fat NO!!!! are these limitations really limitations in terms of performance ---> A big NO...
the average elapsed time is less than .2 ms ---> should we really bother about a query thats running for <0.2 ms ---> No

Not sure why the formatting is bad...

      Mins CPUms Elapms CPU BGets IOWms IO DReads AppWms   Conwms RwsP Sorts Fetchs

ParsUser PlanHash   P UE S:Child Last Active Ago     Execs    PerX  PerX
Pct  PerX PerX  Pct PerX    PerX       PerX  PerX PerX   PerX Module   Sql
Text
-------- ------------ ---- ------- ----------- ----- -------------
------------- ------------- ---- ---------- ------------- ---- -----------
---------- ---------- ---------- ----- ------ -------------- ------------
VISHNU 2782592407 0 V: 0 0108 23:25   2     89993    .112  .212 53    11
.000 0  .00    .000       .006 10.00   0  1 SQL*Plus   select stude
VISHNU 2782592407 0 V: 1 0108 23:26   0    109992    .117  .224 52    11
.000 0  .00    .000       .002 10.00   0  1 SQL*Plus   select stude

Thanks,
Vishnu

On Wed, Jan 8, 2020 at 5:49 PM Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk> wrote:

> Vishnu,
>
> Two things -
> First - don't expect Oracle to do the perfect thing just because you can
> see an obvious improvement.
> Second - if you spot an anomaly, see if you can reproduce it in the
> simplest possible way.
>
> So (1st) In this case the "generic case" that Oracle is using is that an
> IN subquery (or EXISTS) can be unnested and used as an inline view with
> distinct to drive a join. The fact that your example is guaranteed to give
> Oracle a unique set is a little special and the optimizer could, in
> principle, derive this fact by sufficiently close examination of the inline
> view - but that means adding a piece of complexity to the optimizer that
> might be irrelevant in almost all cases (generally if you run an IN
> subquery it's probably because you expect there to be multiple rows for
> each input and you want to stop after 1).
>
>
> (2nd) Isolate the principle of the inline view, try this:
> select distinct a.student_id from students a, temp b where a.student_id in
> (1,2,3,4,5,6,7,8) and b.student_id = a.student_id;
>
> It's "obvious" that this is going to generate a list of unique values even
> if the "distinct" wasn't there, so the optimizer ought to avoid a hash/sort
> unique.
> Here's the execution plan (19.3)
>
>
> -----------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)|
> A-Rows | A-Time | Buffers |
>
> -----------------------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 1 | | 2 (100)|
> 8 |00:00:00.01 | 15 |
> | 1 | SORT UNIQUE NOSORT | | 1 | 1 | 2 (50)|
> 8 |00:00:00.01 | 15 |
> | 2 | NESTED LOOPS SEMI | | 1 | 8 | 1 (0)|
> 8 |00:00:00.01 | 15 |
> | 3 | INLIST ITERATOR | | 1 | | |
> 8 |00:00:00.01 | 5 |
> |* 4 | INDEX UNIQUE SCAN| T_PK | 8 | 8 | 1 (0)|
> 8 |00:00:00.01 | 5 |
> |* 5 | INDEX UNIQUE SCAN | ST_PK | 8 | 8 | 0 (0)|
> 8 |00:00:00.01 | 10 |
>
> -----------------------------------------------------------------------------------------------------
>
>
> Predicate Information (identified by operation id):
> 4 - access(("B"."STUDENT_ID"=1 OR "B"."STUDENT_ID"=2 OR
> "B"."STUDENT_ID"=3 OR
> "B"."STUDENT_ID"=4 OR "B"."STUDENT_ID"=5 OR
> "B"."STUDENT_ID"=6 OR "B"."STUDENT_ID"=7 OR
> "B"."STUDENT_ID"=8))
> 5 - access("B"."STUDENT_ID"="A"."STUDENT_ID")
> filter(("A"."STUDENT_ID"=1 OR "A"."STUDENT_ID"=2 OR
> "A"."STUDENT_ID"=3 OR
> "A"."STUDENT_ID"=4 OR "A"."STUDENT_ID"=5 OR
> "A"."STUDENT_ID"=6 OR "A"."STUDENT_ID"=7 OR
> "A"."STUDENT_ID"=8))
>
>
> (Students is "create as select rownum student_id, ao.* from all_objects
> where rownum <= 50000", temp is your definition.)
> Note, by the way, that the optimizer has used transitive closure to copy
> the inlist from students to temp. In general this is reasonable, in this
> specific case (with the PK declared) it's pointless but another little
> enhancement that would require a lot of careful coding and testing to avoid
> errors in all cases.
>
> You've probably spotted that Oracle has used a sort unique here instead of
> a hash unique - it's just a minor detail in the costing arithmetic. You'll
> also notice that the cardinality estimate for the unique data is very
> silly. The latter is a severe limitation of the optimizer in the general
> case (and another indication that it hasn't tried to look for the
> uniqueness conditions that hold in this query)
>
>
> It's a sad fact that it's easy to find examples where the optimizer
> doesn't do the best thing, or gets the numbers wrong - sometimes it's a
> case of the historical momentum; sometimes it's due to using a generic
> approach for a problem that has a few special cases; sometimes it's because
> the statistics that have been gathered don't describe the data well;
> sometimes it's simply because the problem is too hard. And sometimes it's
> a bug - in which case if you can create a very simple reproducible demo
> it's nice to let Oracle Corp. know.
>
> If you ever need it in a production there's a
> "no_use_hash_aggregation(_at_qb_name)" hint that you could use in your
> examples to stop Oracle from using a hash unique when you want it to do a
> sort unique.
>
> Regards
> Jonathan Lewis
>
> P.S. If you do come across other things that look like defects it's best
> to see if you can create a model that uses pure SQL (or PL/SQL) because
> it's hard enough to persuade first-line support to run SQL scripts, let
> alone do something with Java. (And I've never bothered to acquaint myself
> with setting up java on my sandbox, so I wouldn't be able to recreate you
> data without a lot of hassle that I don't need.)
>
>
> ________________________________________
> From: Vishnu Potukanuma <vishnupotukanuma_at_gmail.com>
> Sent: 08 January 2020 10:51
> To: Jonathan Lewis
> Cc: Oracle L
> Subject: Re: reverse key index -- Sorting expected?
>
>
> Hi Jonathan,
>
> I completely agree with you regarding the optimisation that oracle
> performs when it comes to using hashing or sort unique or sometimes no
> hashing or sort unique.. but the following still bugs me and cannot find a
> suitable explanation as to why this is happening ----- if the query in the
> IN - clause uses join - it is using hash unique.... and we cannot prevent
> it even though if it is not required in some cases.....
>
> considering the same example as above:
> I created two additional tables temp and temp2 as follows:
> create table temp as select student_id from students where student_id <
> 100;
> create table temp2 as select student_id from students where student_id <
> 100;
> alter table temp add constraint temp_pk primary key (student_id);
> alter table temp2 add constraint temp2_pk primary key (student_id);
> gathered statistics.... even histograms as well..
>
> for all the below statements....
> given every possible way....
> select student_id from students where student_id in (select
> a.student_id from temp a, temp2 b where a.student_id = b.student_id and
> a.student_id in (1,2,3,4,5,6,7,8,9,10) and b.student_id in
> (1,2,3,4,5,6,7,8,9,10))
>
> Plan hash value: 1363404670
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost
> (%CPU)| E-Time | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem
> |
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 1 | | | 2
> (100)| | 10 |00:00:00.01 | 12 | | | |
> | 1 | NESTED LOOPS | | 1 | 1 | 16 | 2 (50)|
> 00:00:01 | 10 |00:00:00.01 | 12 | | | |
> | 2 | VIEW | VW_NSO_1 | 1 | 1 | 13 | 1 (0)|
> 00:00:01 | 10 |00:00:00.01 | 8 | | | |
> | 3 | HASH UNIQUE | | 1 | 1 | 6 | | |
> 10 |00:00:00.01 | 8 | 2294K| 2294K| 1041K (0)|
> | 4 | NESTED LOOPS SEMI | | 1 | 1 | 6 | 1 (0)|
> 00:00:01 | 10 |00:00:00.01 | 8 | | | |
> | 5 | INLIST ITERATOR | | 1 | | | |
> | 10 |00:00:00.01 | 4 | | | |
> |* 6 | INDEX UNIQUE SCAN| TEMP_PK | 10 | 10 | 30 |
> 1 (0)| 00:00:01 | 10 |00:00:00.01 | 4 | | | |
> |* 7 | INDEX UNIQUE SCAN | TEMP2_PK | 10 | 1 | 3 |
> 0 (0)| | 10 |00:00:00.01 | 4 | | | |
> |* 8 | INDEX UNIQUE SCAN | STUDENT_IDX | 10 | 1 | 3 |
> 0 (0)| | 10 |00:00:00.01 | 4 | | | |
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
> select student_id from students where student_id in (select
> a.student_id from temp a, temp2 b where a.student_id = b.student_id and
> a.student_id in (1,2,3,4,5,6,7,8,9,10))
>
> Plan hash value: 1363404670
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost
> (%CPU)| E-Time | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem
> |
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 1 | | | 2
> (100)| | 10 |00:00:00.01 | 17 | | | |
> | 1 | NESTED LOOPS | | 1 | 1 | 18 | 2 (50)|
> 00:00:01 | 10 |00:00:00.01 | 17 | | | |
> | 2 | VIEW | VW_NSO_1 | 1 | 1 | 13 | 1 (0)|
> 00:00:01 | 10 |00:00:00.01 | 8 | | | |
> | 3 | HASH UNIQUE | | 1 | 1 | 6 | | |
> 10 |00:00:00.01 | 8 | 2294K| 2294K| 1044K (0)|
> | 4 | NESTED LOOPS SEMI | | 1 | 1 | 6 | 1 (0)|
> 00:00:01 | 10 |00:00:00.01 | 8 | | | |
> | 5 | INLIST ITERATOR | | 1 | | | |
> | 10 |00:00:00.01 | 4 | | | |
> |* 6 | INDEX UNIQUE SCAN| TEMP_PK | 10 | 10 | 30 |
> 1 (0)| 00:00:01 | 10 |00:00:00.01 | 4 | | | |
> |* 7 | INDEX UNIQUE SCAN | TEMP2_PK | 10 | 1 | 3 |
> 0 (0)| | 10 |00:00:00.01 | 4 | | | |
> |* 8 | INDEX UNIQUE SCAN | STUDENT_IDX | 10 | 1 | 5 |
> 0 (0)| | 10 |00:00:00.01 | 9 | | | |
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
> select student_id from students where student_id in (select
> a.student_id from temp a, temp2 b where a.student_id = b.student_id and
> a.student_id < 10)
>
> Plan hash value: 2866610440
>
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)|
> E-Time | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 1 | | | 2 (100)|
> | 9 |00:00:00.01 | 14 | | | |
> | 1 | NESTED LOOPS | | 1 | 8 | 144 | 2 (50)|
> 00:00:01 | 9 |00:00:00.01 | 14 | | | |
> | 2 | VIEW | VW_NSO_1 | 1 | 8 | 104 | 1 (0)|
> 00:00:01 | 9 |00:00:00.01 | 5 | | | |
> | 3 | HASH UNIQUE | | 1 | 8 | 48 | | | 9
> |00:00:00.01 | 5 | 2294K| 2294K| 1047K (0)|
> | 4 | NESTED LOOPS SEMI | | 1 | 8 | 48 | 1 (0)|
> 00:00:01 | 9 |00:00:00.01 | 5 | | | |
> |* 5 | INDEX RANGE SCAN | TEMP_PK | 1 | 9 | 27 | 1
> (0)| 00:00:01 | 9 |00:00:00.01 | 1 | | | |
> |* 6 | INDEX UNIQUE SCAN| TEMP2_PK | 9 | 9 | 27 | 0
> (0)| | 9 |00:00:00.01 | 4 | | | |
> |* 7 | INDEX UNIQUE SCAN | STUDENT_IDX | 9 | 1 | 5 | 0
> (0)| | 9 |00:00:00.01 | 9 | | | |
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------
>
> select student_id from students where student_id in ( select a.student_id
> from temp a, temp2 b where a.student_id = b.student_id and a.student_id
> between 1 and 10 and b.student_id between 1 and 10)
>
> -------------------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
>
> -------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 9 | 144 | 2 (50)| 00:00:01 |
> | 1 | NESTED LOOPS | | 9 | 144 | 2 (50)| 00:00:01 |
> | 2 | VIEW | VW_NSO_1 | 9 | 117 | 1 (0)| 00:00:01 |
> | 3 | HASH UNIQUE | | 9 | 54 | | |
> | 4 | NESTED LOOPS SEMI | | 9 | 54 | 1 (0)| 00:00:01 |
> |* 5 | INDEX RANGE SCAN | TEMP_PK | 10 | 30 | 1 (0)| 00:00:01
> |
> |* 6 | INDEX UNIQUE SCAN| TEMP2_PK | 10 | 30 | 0 (0)| 00:00:01
> |
> |* 7 | INDEX UNIQUE SCAN | STUDENT_IDX | 1 | 3 | 0 (0)| 00:00:01
> |
>
> -------------------------------------------------------------------------------------
>
> we know that the values are unique.... when it comes to estimated rows....
> NESTED LOOPS SEMI ---> it always thinks/estimates that there is 1 row
> less...
>
> This is one aspect of it...
> another aspect of what i was especially concerned is the use of SORT -
> ORDER BY... just adding a ORDER BY STUDENT_ID to the main statement, we
> always end up sorting the results which can be explained with use of HASH
> UNIQUE... it is using HASH UNIQUE even for very small resultsets
>
> ----------------------------------------------------------
> Plan hash value: 2349408382
>
>
> --------------------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
> |
>
> --------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 9 | 54 | 2 (50)| 00:00:01 |
> | 1 | SORT ORDER BY | | 9 | 54 | 2 (50)| 00:00:01 |
> | 2 | NESTED LOOPS | | 9 | 54 | 2 (50)| 00:00:01 |
> | 3 | VIEW | VW_NSO_1 | 9 | 27 | 1 (0)| 00:00:01 |
> | 4 | HASH UNIQUE | | 9 | 54 | | |
> | 5 | NESTED LOOPS SEMI | | 9 | 54 | 1 (0)| 00:00:01 |
> |* 6 | INDEX RANGE SCAN | TEMP_PK | 10 | 30 | 1 (0)|
> 00:00:01 |
> |* 7 | INDEX UNIQUE SCAN| TEMP2_PK | 10 | 30 | 0 (0)|
> 00:00:01 |
> |* 8 | INDEX UNIQUE SCAN | STUDENT_IDX | 1 | 3 | 0 (0)|
> 00:00:01 |
>
> --------------------------------------------------------------------------------------
>
>
> if it had done SORT UNIQUE here instead of hash unique considering the
> cost.. we could have easily avoided the additional PGA related operation
> while sorting the final results from the students table...
>
> if we notice it uses pga both the cases... sort order by and hash
> unique....
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
> | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost
> (%CPU)| E-Time | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem
> |
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 1 | | | 2
> (100)| | 10 |00:00:00.01 | 9 | | | |
> | 1 | SORT ORDER BY | | 1 | 9 | 54 | 2 (50)|
> 00:00:01 | 10 |00:00:00.01 | 9 | 2048 | 2048 | 2048 (0)|
> | 2 | NESTED LOOPS | | 1 | 9 | 54 | 2 (50)|
> 00:00:01 | 10 |00:00:00.01 | 9 | | | |
> | 3 | VIEW | VW_NSO_1 | 1 | 9 | 27 | 1 (0)|
> 00:00:01 | 10 |00:00:00.01 | 5 | | | |
> | 4 | HASH UNIQUE | | 1 | 9 | 54 | |
> | 10 |00:00:00.01 | 5 | 2294K| 2294K| 1064K (0)|
> | 5 | NESTED LOOPS SEMI | | 1 | 9 | 54 | 1 (0)|
> 00:00:01 | 10 |00:00:00.01 | 5 | | | |
> |* 6 | INDEX RANGE SCAN | TEMP_PK | 1 | 10 | 30 |
> 1 (0)| 00:00:01 | 10 |00:00:00.01 | 1 | | | |
> |* 7 | INDEX UNIQUE SCAN| TEMP2_PK | 10 | 10 | 30 |
> 0 (0)| | 10 |00:00:00.01 | 4 | | | |
> |* 8 | INDEX UNIQUE SCAN | STUDENT_IDX | 10 | 1 | 3 |
> 0 (0)| | 10 |00:00:00.01 | 4 | | | |
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Please do let me know if I missed something... regarding this entire
> thing...
>
>
> Thanks,
> Vishnu
>
> On Tue, Jan 7, 2020 at 4:32 PM Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk
> <mailto:jonathan_at_jlcomp.demon.co.uk>> wrote:
>
> >> example of hash function that kicks in during execution:
> >> select roll from randomload where roll in (select a.roll from temp a,
> temp b where a.roll = b.roll) // this case sorting doesn't happen - only
> hashing to eliminate duplicates...
> >> select roll from randomload where roll in (select a.roll from temp) //
> both sorting and hashing to eliminate the duplicates .
>
> The appearance of ant hashing in one of these examples is probably a pure
> costing decision based on the cardinality and clustering estimate - here's
> a plan, pulled from memory, that shows no hashing at all:
>
>
> select object_name from t1 where object_id in (select object_id from t2)
>
> Plan hash value: 2010303231
>
>
> --------------------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes | Cost
> (%CPU)| Time |
>
> --------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | | | 65
> (100)| |
> | 1 | NESTED LOOPS | | 62 | 2976 | 65
> (2)| 00:00:01 |
> | 2 | NESTED LOOPS | | 62 | 2976 | 65
> (2)| 00:00:01 |
> | 3 | SORT UNIQUE | | 62 | 248 | 2
> (0)| 00:00:01 |
> | 4 | TABLE ACCESS FULL | T2 | 62 | 248 | 2
> (0)| 00:00:01 |
> |* 5 | INDEX RANGE SCAN | T1_I1 | 1 | | 1
> (0)| 00:00:01 |
> | 6 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 44 | 2
> (0)| 00:00:01 |
>
> --------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> 5 - access("OBJECT_ID"="OBJECT_ID")
>
> The optimizer has used a sort unique rather than a hash unique on T2
> because it expects to find a small number of rows, and sorting the values
> for uniqueness may result in more efficient access for the nested loop join.
>
>
> Bottom line - Oracle uses a sort to ensure uniqueness (and minimal
> iterations) for an IN-list; but for an in subquery it simply picks the best
> optimizer strategy for handling the join and subsequent activity. Here's
> another plan that doesn't even try for uniqueness (using the same two
> tables, reversing the roles), but switches to a semi-join:
>
> select object_name from t2 where object_id in (select object_id from t1)
>
> Plan hash value: 3107539380
>
>
> -------------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
> |
>
> -------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | | | 25 (100)|
> |
> |* 1 | HASH JOIN SEMI | | 62 | 2790 | 25 (24)|
> 00:00:01 |
> | 2 | TABLE ACCESS FULL | T2 | 62 | 2480 | 2 (0)|
> 00:00:01 |
> | 3 | INDEX FAST FULL SCAN| T1_I1 | 56753 | 277K| 20 (15)|
> 00:00:01 |
>
> -------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> 1 - access("OBJECT_ID"="OBJECT_ID")
>
>
>
> Regards
> Jonathan Lewis
>
>
> PS 19.3.0.0
> create table t1 as select * from all_objects;
> create index t1_i1 on t1(object_id);
> create table t2 as select * from t1 where mod(object_id,1000) = 0;
>
>
>
>
>
> ________________________________________
> From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>
> <oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>> on
> behalf of Vishnu Potukanuma <vishnupotukanuma_at_gmail.com<mailto:
> vishnupotukanuma_at_gmail.com>>
> Sent: 06 January 2020 08:55
> To: Andy Sayer
> Cc: Oracle-L Freelists
> Subject: Re: reverse key index -- Sorting expected?
>
> Hi Andy,
>
> Now that I look at the sentence I have written - it doesn't make sense..
> regarding hash function... Let me rephrase things more clearly and
> precisely the way i am looking at...
> I was referring to the functionality used in eliminating the duplicate
> values in the IN list ( )... as it makes the final transformed query...
> (i believe it is using hash function due to the following)...
>
> example of hash function that kicks in during execution:
> select roll from randomload where roll in (select a.roll from temp a, temp
> b where a.roll = b.roll) // this case sorting doesn't happen - only hashing
> to eliminate duplicates...
> select roll from randomload where roll in (select a.roll from temp) //
> both sorting and hashing to eliminate the duplicates .
>
> db file sequential read event was basically to see the ordering of blocks
> read from extent to extent and based on the index treedump to see precisely
> whether it is reading from left most index leaf block to the right most
> index leaf block... now that looking at it from a different perspective....
> even ordering the values in the ascending order doesn't make sense after a
> certain point as the probability or even potentially benefits of this
> diminishes as the number of entries in the leaf blocks reduces due to
> - 50-50 block splits or
> - nature of data load (eg: orders from a particular store increase
> considerably during a certain time when they give offers)
> - due to the size of the indexed column
> - event multi column indexes and data type of the column (char or string)
> - in which inherently less number of entries in leaf blocks - (also block
> size dependent) -
>
> the wired part of this is that it even sorts character data
> select * from students where name in ('Vishnu','VISHNU','VIshnu'); //
> doesn't really make sense...
>
> Unless I have missed something completely - i cant see a valid purpose of
> sorting entries in the IN- List for a wide majority of the use cases
> (unless the entries lists in the IN clause are extremely close to each
> other) - which reduces the latch gets by only 1 even in this case - this
> raises another question - does this even sort if we use bind variables have
> to test this...
>
> Thanks,
> Vishnu
>
> On Mon, Jan 6, 2020 at 1:24 PM Andy Sayer <andysayer_at_gmail.com<mailto:
> andysayer_at_gmail.com><mailto:andysayer_at_gmail.com<mailto:andysayer_at_gmail.com>>>
> wrote:
> " Is this a bug? from what I look at applying a hash function to eliminate
> the duplicates and sorting the results especially for reverse key indexes
> is basically an unnecessary step"
>
> What hash function are you referring to and how do the db file sequential
> read events indicate this is happening?
>
> How did you populate this students table? If the searched column was
> generated by a sequence with no gaps then one would expect the keys you're
> looking for to be in different leaf blocks for the normal indexes so when
> you do the index traversal you might be more likely to require reading from
> disk more. The reverse key index might place the index keys closer together
> or further aspart depending on the implementation. Neither can really be
> proved just by looking at the physical block gets. Given the fetch in
> between sequential reads it looks like a different amount of data is
> actually being requested, does that mean that you used a differently
> populated table for the two tests?
>
> Additionally we cannot see which object your sequential reads are being
> reported against, it could be the table or the index. The trace data
> includes the object id so you can look it up for us.
>
> Reverse key indexes are usually related to increasing physical IO for
> standard operations because the working set you usually look at in a table
> are the few most recently inserted rows. For a reverse key index using a
> sequence to generate the value, these last few values are going to be
> scattered far around the index (which is why they're usually naively used
> to help concurrency). They also restrict the use of range predicates
> because the values are not sorted in the same way that the index will be.
> What is your use case here? Or are you just experimenting?
>
> Thanks,
> Andrew
>
> On Mon, 6 Jan 2020 at 06:39, Vishnu Potukanuma <vishnupotukanuma_at_gmail.com
> <mailto:vishnupotukanuma_at_gmail.com><mailto:vishnupotukanuma_at_gmail.com
> <mailto:vishnupotukanuma_at_gmail.com>>> wrote:
> Hi,
>
> consider the following case:
> create table randomload(roll number, name varchar2(20), mark1 number);
> populate the table with randomdata.
> create index roll_desc_idx on randomload(roll desc);
> or even ascending index...
> roll is a unique key monotonically increasing.
>
> so if the maximum value is 1000000 - this value is stored in the left most
> leaf block in the index btree structure.
>
> Now consider the following situation:
> select * from students where student_id in (1,100000, 200000, 300000,
> 400000);
> we can use the above values in any order we want, here is precisely what
> oracle is doing even before the final query transformation is done, it
> eliminates the duplicate values and sorts the values in the ascending
> order. the query after final transformation becomes..
> select * from students where student_id =1 or student_id = 100000 or
> student_id=200000 or student_id = 300000 or student_id = 400000;
>
> this works precisely as expected for all normal indexes and descending
> indexes, since sorting them in ascending order as oracle can minimize the
> IO if the values are adjacent to each other. but works horrible for reverse
> key indexes.... as there is no order to which the oracle scans the leaf
> blocks.
>
> I mean by default Oracle sorts these values in ascending order regardless
> of what, for the descending index: there is no issue with the descending
> indexes as well (but not in all the cases) ...
> but for reverse key indexes?
>
> trace of index leaf block reads for a descending index...
> WAIT #140268739421912: nam='db file sequential read' ela= 90 file#=7
> block#=30099 blocks=1 obj#=74642 tim=5089544852
> WAIT #140268739421912: nam='db file sequential read' ela= 95 file#=7
> block#=29904 blocks=1 obj#=74642 tim=5089545009
> WAIT #140268739421912: nam='db file sequential read' ela= 83 file#=7
> block#=29667 blocks=1 obj#=74642 tim=5089545184
> WAIT #140268739421912: nam='db file sequential read' ela= 101 file#=7
> block#=29430 blocks=1 obj#=74642 tim=5089545354
> WAIT #140268739421912: nam='db file scattered read' ela= 199 file#=7
> block#=29424 blocks=6 obj#=74642 tim=5089545650
> WAIT #140268739421912: nam='db file sequential read' ela= 86 file#=7
> block#=29192 blocks=1 obj#=74642 tim=5089545807
>
>
>
> the following is the trace for reverse key index:
> WAIT #140452225803960: nam='db file sequential read' ela= 215 file#=7
> block#=17470 blocks=1 obj#=74643 tim=5897433740
> WAIT #140452225803960: nam='db file sequential read' ela= 213 file#=7
> block#=16909 blocks=1 obj#=74643 tim=5897434021
> FETCH
> #140452225803960:c=631,e=1033,p=10,cr=3,cu=0,mis=0,r=1,dep=0,og=1,plh=3725354939,tim=5897434099
> WAIT #140452225803960: nam='SQL*Net message from client' ela= 326 driver
> id=1650815232 #bytes=1 p3=0 obj#=74643 tim=5897434480
> WAIT #140452225803960: nam='db file sequential read' ela= 228 file#=7
> block#=18162 blocks=1 obj#=74643 tim=5897434785
> WAIT #140452225803960: nam='db file sequential read' ela= 221 file#=7
> block#=18080 blocks=1 obj#=74643 tim=5897435085
> WAIT #140452225803960: nam='SQL*Net message to client' ela= 1 driver
> id=1650815232 #bytes=1 p3=0 obj#=74643 tim=5897435152
> WAIT #140452225803960: nam='db file sequential read' ela= 214 file#=7
> block#=19545 blocks=1 obj#=74643 tim=5897435399
> WAIT #140452225803960: nam='db file sequential read' ela= 213 file#=7
> block#=19383 blocks=1 obj#=74643 tim=5897435681
> WAIT #140452225803960: nam='db file sequential read' ela= 209 file#=7
> block#=20929 blocks=1 obj#=74643 tim=5897435960
> WAIT #140452225803960: nam='db file sequential read' ela= 192 file#=7
> block#=20686 blocks=1 obj#=74643 tim=5897436213
> WAIT #140452225803960: nam='db file sequential read' ela= 183 file#=7
> block#=22313 blocks=1 obj#=74643 tim=5897436471
> WAIT #140452225803960: nam='db file sequential read' ela= 202 file#=7
> block#=21989 blocks=1 obj#=74643 tim=5897436735
> WAIT #140452225803960: nam='db file sequential read' ela= 223 file#=7
> block#=23696 blocks=1 obj#=74643 tim=5897437039
> WAIT #140452225803960: nam='db file sequential read' ela= 207 file#=7
> block#=23292 blocks=1 obj#=74643 tim=5897437320
>
> Is this a bug? from what I look at applying a hash function to eliminate
> the duplicates and sorting the results especially for reverse key indexes
> is basically an unnecessary step....
>
> Thanks,
> Vishnu
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Jan 08 2020 - 14:44:59 CET

Original text of this message