Re: high temp space usage for same analytic function on latest version

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Mon, 14 Nov 2022 23:01:35 +0000
Message-ID: <CAGtsp8nOtyap54ApogLLGXDSxGMyLaW8E_M1GLyE62pjc63ngQ_at_mail.gmail.com>



You need to apply the hint to the view or query block, NOT the table. See, for example:
https://jonathanlewis.wordpress.com/2019/07/01/opt_estimate-4/

In your case I think the following should work:

                      INNER JOIN (    SELECT /*+ opt_estimate(query_block
rows=18) */  ROWNUM - 1 AS RN
                                        FROM DUAL
                                  CONNECT BY LEVEL <= 18) X

"scale_rows = 18" would also work, because the estimate is 1 for dual. optionally you could give the inline view a query block name and use that in the hint, e.g:

                      INNER JOIN (    SELECT /*+ qb_name(dual_qb
opt_estimate(_at_dual_qb query_block rows=18) */  ROWNUM - 1 AS RN
                                        FROM DUAL
                                  CONNECT BY LEVEL <= 18) X


The opt_estimate() hint doesn't seem to be an optimizer hint (I think I've mentioned that in one my other opt_estimate blog notes) so doesn't get mentioned (or even counted) in the hint_report at all.

Regards
Jonathan Lewis

On Mon, 14 Nov 2022 at 17:18, yudhi s <learnerdatabase99_at_gmail.com> wrote:

> Thank You Sayan and Jonathan.
>
> As you suggested, as a quick thing , in the select query, I was trying
> this hint in the main block to influence optimizer to increase the
> resulting rows ~18 times from the CTE "C" , but its not getting reflected
> in the plan, which means optimizer is not considering this, but then it
> should appear in the hint report section of the plan as error or unused
> etc.. but it is not getting published there too, so not sure why is this
> happening. Is there anything wrong in the hint below which i am using?
>
> I am using below hint in the main select query which performs outer join
> with the "cte" Table C
>
> /*+OPT_ESTIMATE( TABLE, C, scale_rows=18)*/
>
> Also , I tried as below but seeing no change to the tempspace consumption
> though.
>
> cast(RPAD (SUBSTR (CP.CP_NBR, 1, CP.CPCL_NBR - X.RN),CASE
> CP.PANL_CNT WHEN 0 THEN 16 ELSE CP.PANL_CNT END, '9') as
> varchar2(22)) AS HIGH_VALUE
>
> cast(RPAD (SUBSTR (CP.CP_NBR, 1, CP.CPCL_NBR - X.RN), CASE
> CP.PANL_CNT WHEN 0 THEN 16 ELSE CP.PANL_CNT END, '0') as varchar2(22)
> ) AS LOW_VALUE,
>
>
>
>
> On Mon, 14 Nov, 2022, 4:22 pm Jonathan Lewis, <jlewisoracle_at_gmail.com>
> wrote:
>
>> Still avoiding spending a lot of time on the github text.
>>
>> a) You materialize a CTE that invovles a connect by query that generates
>> 18 rows - use the opt_estimate() hint to tell Oracle that the materialized
>> view holds 18 times the rows it expects. (If you examine the plan the
>> optimizer allows for ONE row coming from dual, not 18).
>>
>> b) You have several places where you use expresssions like
>> substr(c_nbr,1,4) in predicates . Create (optionally invisible) virtual
>> columns for these expressions and gather stats on them, this will give the
>> optimizer a better chance of getting the right arithmetic, especially for
>> cases where the expression needs a frequency (or top-frequency) histogram.
>> Invisible is safest for existing production code, but you may need to test
>> carefully to see if the optimizer manages to use the stats from the virtual
>> columns when it sees the equivalent expressions.
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>>
>> On Sun, 13 Nov 2022 at 14:28, yudhi s <learnerdatabase99_at_gmail.com>
>> wrote:
>>
>>> Thank you Lok. Actually the join is not exactly on the columns but with
>>> some functions so that might be the case why it seems like Cartesian.
>>>
>>> However the point is , this query has not been changed and I agree that
>>> it may be because the data has been increased a lil in one of the table 'C'
>>> or 'T' , so the join output has been increased from 7billion to 10billion
>>> but how come that justifies the temp space requirement from 20gb to
>>> 2.5terabyte it not twice or thrice but multiple times.
>>>
>>> As this query is running in parallel-16 degree because of a underlying
>>> table degree , and considering no other option at hand, so I was thinking
>>> if by increasing or decreasing the parallelism with help of hints will help
>>> in completing the query at least with lesser temp space(we have 2.5TB of
>>> temp space with us right now currently) ? Or any other possible option to
>>> make this query succeed?
>>>
>>> Just to note , we had initially 200gb of temp space but just with the
>>> hope that this query may finish, we already increased it to 2.5TB now, but
>>> still no luck and the query is failing and it's happening after this 19.15
>>> patch.
>>>
>>>
>>> On Sun, 13 Nov, 2022, 6:06 pm Lok P, <loknath.73_at_gmail.com> wrote:
>>>
>>>> The below section you posted shows the input to 'Hash join outer' were
>>>> ~1Million and 37milion respectively, but the result came from the outer
>>>> join is ~7billion. So is there any oddity in the data in table 'T' and 'C'
>>>> which is playing a role which is increasing the rows input to your
>>>> analytical function recently?
>>>>
>>>>
>>>> SQL Plan Monitoring Details (Plan Hash Value=2056239125)
>>>>
>>>> =================================================================================================================================================================================================================================================================================
>>>> | Id | Operation
>>>> | Name | Rows | Cost | Time | Start |
>>>> Execs | Rows | Read | Read | Write | Write | Cell | Mem | Temp
>>>> | Activity | Activity Detail |
>>>> | |
>>>> | | (Estim) | | Active(s) | Active |
>>>> | (Actual) | Reqs | Bytes | Reqs | Bytes | Offload | (Max) | (Max)
>>>> | (%) | (# samples) |
>>>>
>>>> =================================================================================================================================================================================================================================================================================
>>>> |
>>>> | | | | | |
>>>> | | | | | | | | |
>>>> | direct path read temp (553) |
>>>> | |
>>>> | | | | | |
>>>> | | | | | | | |
>>>> | | direct path write temp (47) |
>>>> | 31 | HASH JOIN RIGHT OUTER
>>>> | | 9M | 2M | 1676 | +218 |
>>>> 16 | 7G | | | | | | 217M |
>>>> | 23.08 | Cpu (4041) |
>>>> | 32 | BUFFER SORT
>>>> | | | | 4 | +216 |
>>>> 16 | 1M | | | | | | 97M | |
>>>> | |
>>>> | 33 | PX RECEIVE
>>>> | | 10647 | 24 | 4 | +216 |
>>>> 16 | 1M | | | | | | | |
>>>> | |
>>>> | 34 | PX SEND HASH
>>>> | :TQ10005 | 10647 | 24 | 1633 | +216 |
>>>> 1 | 1M | | | | | | |
>>>> | 0.01 | Cpu (2) |
>>>> | 35 | VIEW
>>>> | | 10647 | 24 | 1632 | +217 |
>>>> 1 | 1M | | | | | | | |
>>>> | |
>>>> | 36 | TABLE ACCESS STORAGE FULL
>>>> | SYS_TEMP_0FDA1E71E_D71F1BDE | 10647 | 24 | 1632 | +217 |
>>>> 1 | 1M | 106 | 104MB | | | | | |
>>>> 0.01 | cell multiblock physical read (1) |
>>>> | 37 | PX RECEIVE
>>>> | | 9M | 2M | 1676 | +218 |
>>>> 16 | 37M | | | | | | |
>>>> | 0.05 | Cpu (9) |
>>>> | 38 | PX SEND HASH
>>>> | :TQ10009 | 9M | 2M | 1677 | +217 |
>>>> 16 | 37M | | | | | | | |
>>>> 0.13 | Cpu (23) |
>>>> | 39 | VIEW
>>>> | | 9M | 2M | 1677 | +217 |
>>>> 16 | 37M | | | | | | |
>>>> | | |
>>>>
>>>> On Sun, Nov 13, 2022 at 2:34 AM yudhi s <learnerdatabase99_at_gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you Jonathan.
>>>>> Actually the original query was a long one so i was trying to reduce
>>>>> the complexity by just posting the particular section of the query which
>>>>> was causing the temp space spike. But I agree that just a partial query
>>>>> doesn't make much sense. Also my apology as the plan format was distorted.
>>>>>
>>>>> I have again posted the exact query below with the actual object names
>>>>> being replaced with dummy names. I have posted the plan in the github in
>>>>> the link below, so the sql execution plan format will be intact. And I am
>>>>> not having the "display cursor" plan for the current runs/failures for this
>>>>> INSERT query, so i have posted the sql monitors for them and i had the
>>>>> display cursor plan when it was running fine in 11.2.0.4 so i posted that
>>>>> just for reference. Also I have copied and pasted the outline section from
>>>>> the display_awr below each of the sqlmonitor plan.
>>>>>
>>>>>
>>>>> https://gist.github.com/databasetech0073/714263bce477f624763f757e457cb861
>>>>>
>>>>> As mentioned earlier, the section of the plan which is doing the outer
>>>>> join is below and it results in ~10billion rows. Which then is passed to
>>>>> the analytical function. Was trying to understand, Is there any possible
>>>>> way to minimize the temp space consumption in this scenario? because we
>>>>> kept on increasing the temp space assuming it would succeed somehow, but
>>>>> it just kept consuming 2.5TB+ space and then also it failed with "Ora-01652
>>>>> unable to extend temp segment .." as it saturates all allocated temp
>>>>> space?
>>>>> As far as I know the encryption/TDE was already there in this database
>>>>> before we moved from 19.11 to 19.15. I will double check with the infra
>>>>> team on this.
>>>>>
>>>>> "LEFT OUTER JOIN C
>>>>> ON SUBSTR (T.C_NBR, 1, 4) = C.BNK_ID_NB_4
>>>>> AND LENGTH (T.C_NBR) = C.PANL_CNT
>>>>> AND T.C_NBR BETWEEN C.LOW_VALUE AND C.HIGH_VALUE) S"
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Nov 12, 2022 at 12:50 PM Jonathan Lewis <
>>>>> jlewisoracle_at_gmail.com> wrote:
>>>>>
>>>>>> >> We reran the query and it took a different plan hash value ,
>>>>>> however the path is almost similar wrt the line which does "Outer join" and
>>>>>> the "analytical function" evaluation. I have posted the specific section of
>>>>>> sql monitor below with predicate and column projection for that run.
>>>>>>
>>>>>> The supplied outputs are virtually unreadable, for different
>>>>>> execution plans, for different versions (and youve changed from 19.11 for
>>>>>> the "slow" query to 11.2.0.4!) , and partial. I'm not going to look at them.
>>>>>>
>>>>>> >> However, the sql monitor shows status as 'error' out after
>>>>>> sometime(~30minutes)) but the underlying sessions(~33 parallel sessions)
>>>>>> kept on running for 2hrs+
>>>>>>
>>>>>> That looks like the behaviour that Mohamed Houri has written about in
>>>>>> the past: https://hourim.wordpress.com/2020/03/07/sql-monitor-flaw/
>>>>>>
>>>>>> I'd forgotten it would be harder to collect the correct statistics
>>>>>> from all the processes when running a parallel query - finding sessions by
>>>>>> SQL_ID could work, but using views like v$px_sesstat etc. is safer.
>>>>>> Nothing stands out from the information you supplied about wait events, and
>>>>>> the only "unusual" thing in the session activity is that you are writing
>>>>>> encrypted blocks to the temporary tablespace - were you doing that in the
>>>>>> earlier version? It shouldn't explain a massive increase in space
>>>>>> requirements, but it might be relevant - it would help to explain an
>>>>>> increase in CPU.
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>> Jonathan Lewis
>>>>>>
>>>>>>
>>>>>> On Fri, 11 Nov 2022 at 17:07, yudhi s <learnerdatabase99_at_gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thank You Sayan, Jonathan.
>>>>>>>
>>>>>>> We reran the query and it took a different plan hash value , however
>>>>>>> the path is almost similar wrt the line which does "Outer join" and the
>>>>>>> "analytical function" evaluation. I have posted the specific section of sql
>>>>>>> monitor below with predicate and column projection for that run. However,
>>>>>>> the sql monitor shows status as 'error' out after sometime(~30minutes))
>>>>>>> but the underlying sessions(~33 parallel sessions) kept on running for
>>>>>>> 2hrs+, until then I saw ~2TB of temp space already consumed and later it
>>>>>>> failed with insufficient temp space error.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Nov 15 2022 - 00:01:35 CET

Original text of this message