Re: question on table access by index rowid batched

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Sun, 10 Oct 2021 19:22:14 +0100
Message-ID: <CAGtsp8kw7RCW6FnyDdS9VXb78-4pyH6=biUz2bxyP7MGEh=MsQ_at_mail.gmail.com>



_at_Lok P <loknath.73_at_gmail.com>

It's a pity you posted just a few minutes too early (or I posted a few minutes too late).

It's NOT the statistics collector that takes time - part of the problem is that the plan switches two joins from nested loops that operate relatively efficiency to hash joins that (if you want the first few rows only) introduce a initial delay.

But the biggest change in workload/timing then comes from the way the order of the rowsource produced at operation 4 has been changed by using two hash joins instead of two nested loop joins - resulting in randomised I/O against a table that appears 60 lines further down the plan.

I've finally re-discovered the note I wrote about this ordering effect: https://jonathanlewis.wordpress.com/2007/03/05/ordering/

Regards
Jonathan Lewis

On Sun, 10 Oct 2021 at 18:59, Lok P <loknath.73_at_gmail.com> wrote:

> I think you are correct. As Jonathan explained, the issue here is just
> because of the extra work added by the statistics collector operation which
> does the blocking/buffering operation to decide future Joins. But yes ~131
> seconds of blocking is too much for any first_rows/oltp/search query to
> wait for. And as you pointed out in one of the cases the same table access
> by index rowid operation took 5GB and other was less . And to which
> Jonathan highlighted, the difference in data pattern made that thing
> happen(which inturn was because of the statistics collector), looks
> interesting. I have never thought of that.
>
> We also have few of our OLTP environments on 19C but there we have
> optimizer_adaptive_plans itself set as false and we were planning to enable
> it based on certain inputs from other team members. Curious to know if
> others experienced such issues with first_rows type of queries with
> adaptive plans enabled and we should stop enabling it?
>
> On Sat, Oct 9, 2021 at 9:50 PM Pap <oracle.developer35_at_gmail.com> wrote:
>
>> Thank you Jonathan.
>>
>> There is a difference of ~350+ seconds between the run time of both the
>> plans and out of this, ~131 seconds is contributed by the statistics
>> collector which does the additional mathematics/counting to opt for the
>> best path at that moment (must be based on the runtime/actual cardinality
>> information). This query was actually a search query and running with
>> first_rows hint and customer expecting to see the result as first ~50 odd
>> rows asap on the screen but with the optimizer_adaptive_reporting_only
>> 'false' it was using another set of mathematics(or as you mentioned
>> multipasses) in this query adding significant overhead for this quick
>> query. So is it recommended to have this parameter set as 'true' mainly in
>> the oltp kind of environment ? Or should totally turn the key driver i.e.
>> optimizer_adpative_plan as false?
>>
>> Note -- Just to note, after the upgrade , I think the infra team just
>> kept it all default i.e. optimizer_adaptive_plan- true,
>> optimizer_adaptive_reporting_only- false, optimizer_adaptive_statistics-
>> false. So wondering if it's advisable/ and safe to change few to non
>> default?
>>
>> As you rightly explained, there seems to be no role of rowid batching in
>> this performance issue. Again, not much experience in reading
>> complex execution paths, but from the path initially I was interpreting, it
>> as both plans are exactly the same , with just the difference in 'rowid
>> batching operation. But from your explanation, I see I was completely
>> wrong. Now, I can see in the good plan case it has read ~28million at step
>> ~46 for index TP_PK but in the bad plan case it is not noted in the sql
>> monitor. but conversely , ~5GB data being read in the bad plan, at
>> step-59 with ~661k read requests and it resulted in ~770k rows there. But
>> in case of a good plan those bytes read and number of read requests are not
>> noted. So does this point to the fact that the statistics collector is
>> really playing a role in the order in which data results out from one
>> operation/step to another?
>>
>>
>> On Sat, Oct 9, 2021 at 4:49 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
>> wrote:
>>
>>>
>>> Remember I was talking POSSIBILITIES in my previous post, not
>>> certainties.
>>>
>>> With the extra information you have now supplied I have a better
>>> hypothesis, based on the fact that you have adaptive_plans_enabled (which I
>>> should have realised in the first place), and that the plans you've
>>> supplied are different in exactly the TYPE of reason I suggested, but not
>>> for the exact reason. My basic hypothesis now is that when you enabled
>>> adaptive plans but set it to reporting only Oracle will always include the
>>> statistics collector operations, but it will not use them (i.e. not
>>> counting etc,) to make a decision about whether to take a hash join or a
>>> nested loop join. It will simply generate a path based on its first pass
>>> calculations.
>>>
>>> (You could check this with a suitable set of examples and testing with
>>> (a) adaptive plans enable (b) adaptive plans enabled but reporting only (c)
>>> adaptive plans disabled. You'd need to set up 4 possibilities: nlj
>>> changing to hj due to adaptive plans, hj changing to nlj due to adaptive
>>> plans, nlj not changing, hj not changing).
>>>
>>> So Looking at your SQL Monitor reports - lines 41 and 42 as a starting
>>> point:
>>>
>>> WIth reporting only = TRUE
>>>
>>> ==============================================================================================================================================================================
>>> | Id | Operation |
>>> Name | Rows | Cost | Time | Start | Execs | Rows
>>> | Read | Read | Mem |
>>> | | |
>>> | (Estim) | | Active(s) | Active | |
>>> (Actual) | Reqs | Bytes | (Max) |
>>>
>>> ===============================================================================================================================================================================
>>> | 41 | INDEX RANGE SCAN | TCX_PK
>>> | 1 | 2 | 279 | +2 | 2M |
>>> 2M | | | . |
>>> | 42 | INDEX RANGE SCAN | TCX_PK
>>> | 1 | 2 | | | |
>>> | | | . |
>>>
>>> With reporting only = FALSE
>>>
>>> ===============================================================================================================================================================================================
>>> | Id | Operation |
>>> Name | Rows | Cost | Time | Start | Execs | Rows
>>> | Read | Read | Write | Write | Mem |
>>> | | |
>>> | (Estim) | | Active(s) | Active | |
>>> (Actual) | Reqs | Bytes | Reqs | Bytes | (Max) |
>>>
>>> ===============================================================================================================================================================================================
>>> | 41 | INDEX RANGE SCAN | TCX_PK
>>> | 1 | 2 | | | |
>>> | | | | | . |
>>> | 42 | INDEX RANGE SCAN | TCX_PK
>>> | 1 | 2 | 1 | +131 | 1 |
>>> 976K | | | | | . |
>>>
>>> These two lines are the options available as a result of a statistics
>>> collector further up the plan, and one plan takes operation 41, the other
>>> takes operation 42
>>>
>>> Putting them into context (adding in some parent and sibling rows)
>>>
>>> Reporting only = true ... plan based on estimates
>>>
>>> ==============================================================================================================================================================================
>>> | Id | Operation |
>>> Name | Rows | Cost | Time | Start | Execs | Rows
>>> | Read | Read | Mem |
>>> | | |
>>> | (Estim) | | Active(s) | Active | |
>>> (Actual) | Reqs | Bytes | (Max) |
>>>
>>> ===============================================================================================================================================================================
>>> | 7 | NESTED LOOPS OUTER |
>>> | 1 | 3M | 279 | +2 | 1 |
>>> 2M | | | . | | |
>>> | 8 | HASH JOIN OUTER |
>>> | 1 | 3M | 279 | +2 | 1 |
>>> 2M | | | . |
>>> | 9 | NESTED LOOPS OUTER |
>>> | 1 | 3M | 279 | +2 | 1 |
>>> 2M | | | . |
>>> | 10 | STATISTICS COLLECTOR |
>>> | | | 279 | +2 | 1 |
>>> 2M | | | . |
>>> | 11 | NESTED LOOPS OUTER |
>>> | 1 | 3M | 279 | +2 | 1 |
>>> 2M | | | . |
>>>
>>> ...
>>> | 41 | INDEX RANGE SCAN | TCX_PK
>>> | 1 | 2 | 279 | +2 | 2M |
>>> 2M | | | . |
>>> | 42 | INDEX RANGE SCAN | TCX_PK
>>> | 1 | 2 | | | |
>>> | | | . |
>>> | 43 | TABLE ACCESS BY INDEX ROWID | TC
>>> | 1 | 2 | 279 | +2 | 2M |
>>> 2M | 16037 | 125MB | . | | |
>>>
>>>
>>> Reporting only = false ... plan adapts to counting
>>> (OPTERATION 10 dictates a HASH JOIN where the estimated stats suggested
>>> NLJ)
>>>
>>> ===============================================================================================================================================================================================
>>> | Id | Operation |
>>> Name | Rows | Cost | Time | Start | Execs | Rows
>>> | Read | Read | Write | Write | Mem |
>>> | | |
>>> | (Estim) | | Active(s) | Active | |
>>> (Actual) | Reqs | Bytes | Reqs | Bytes | (Max) |
>>>
>>> ===============================================================================================================================================================================================
>>> | 7 | NESTED LOOPS OUTER |
>>> | 1 | 3M | 27 | +131 | 1 |
>>> 2M | | | | | . |
>>> | 8 | HASH JOIN OUTER |
>>> | 1 | 3M | 155 | +3 | 1 |
>>> 2M | 3035 | 1GB | 3035 | 1GB | 309MB |
>>> | 9 | NESTED LOOPS OUTER |
>>> | 1 | 3M | 129 | +3 | 1 |
>>> 2M | | | | | . |
>>> | 10 | STATISTICS COLLECTOR |
>>> | | | 129 | +3 | 1 |
>>> 2M | | | | | . |
>>> | 11 | NESTED LOOPS OUTER |
>>> | 1 | 3M | 129 | +3 | 1 |
>>> 2M | | | | | . |
>>> ...
>>> | 41 | INDEX RANGE SCAN | TCX_PK
>>> | 1 | 2 | | | |
>>> | | | | | . |
>>> | 42 | INDEX RANGE SCAN | TCX_PK
>>> | 1 | 2 | 1 | +131 | 1 |
>>> 976K | | | | | . |
>>> | 43 | TABLE ACCESS BY INDEX ROWID | TC
>>> | 1 | 2 | 27 | +131 | 2M |
>>> 2M | 21549 | 168MB | | | . |
>>>
>>>
>>> As you can see, the ability to switch plans makes a big difference to
>>> the amount of time Oracle spends working before it starts executing step 7
>>> (start time = +2 seconds vs. +131 seconds).
>>>
>>> This one extract from the plans doesn't answer the question about why
>>> "table access by index rowid batched" has appeared; nor does it explain the
>>> total difference between the two executions because you also need to look
>>> at the statistics collector at operation 6 and the impact that has on which
>>> of operation 45/46 and 47/48 get chosen. The other point that then comes up
>>> is that the order in which data appears from a hash join is (almost
>>> certainly) different from the order it appears from a nested loop join -
>>> which means that when you use the result of the join to probe other tables
>>> by index the order of the driving data from one join may be roughly in line
>>> with the indexes and data from the next table(s) in the join and benefit
>>> from a lot of "self-induced" caching, while the other join might produce
>>> data in what is effectively a randomised order resulting in "self-flushing"
>>> and more random reads - and that might explain what happens at operations
>>> 60/61
>>>
>>> What my observations suggest is a good argument for why you had such a
>>> difference in performance, and that it had nothing to do with the batching
>>> or not of the table access by rowid.
>>>
>>> I may write this up in more detail as a blog some day, but I don't have
>>> time right now.
>>>
>>> Regards
>>> Jonathan Lewis
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 8 Oct 2021 at 18:34, Pap <oracle.developer35_at_gmail.com> wrote:
>>>
>>>> It's fluctuating. I am now not able to reproduce the scenario for
>>>> that same small query for which I had posted here just before. And you are
>>>> correct , I was luckily having sql monitors saved. I am attaching those
>>>> here. If you see that, the main query was showing those rowid batched
>>>> operations when we have the optimizer_adaptive_reporting_only set as
>>>> default/false and was taking a long time to finish and also the first few
>>>> rows were also taking longer to get produced out of the query. But the
>>>> sample small query which i had posted a cursor plan for was just showing
>>>> the opposite behaviour. That is going for a rowid batched path when
>>>> optimizer_adaptive_reporting_only sets as true.
>>>>
>>>> However, now I am seeing that same small query in both the cases
>>>> (irrespective of value of optimizer_adaptive_reporting_only) going for
>>>> 'rowid batched' execution path. Not sure if it's just stats or
>>>> anything else influencing and I am seeing different things behaviour. Just
>>>> to note we have 'optimizer_adaptive_plans' set to true ,
>>>> 'optimizer_adaptive_statistics' set to false. The only change we made was
>>>> moving ' optimizer_adaptive_reporting_only' from false to true.
>>>>
>>>> And Jonathan when you said the optimizer_adaptive_reporting_only = true
>>>> will introduce 'statistics collector' operation, but if you see the
>>>> attached sql monitor for the main query, i am seeing 'statistics collector'
>>>> even when optimizer_adaptive_reporting_only is = false. Is that expected
>>>> behaviour?
>>>>
>>>>
>>>>
>>>
>>>
>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Oct 10 2021 - 20:22:14 CEST

Original text of this message