Re: high temp space usage for same analytic function on latest version

From: yudhi s <learnerdatabase99_at_gmail.com>
Date: Sun, 13 Nov 2022 19:57:57 +0530
Message-ID: <CAEzWdqc9yvj05y=y9y8mBSQF3Cp=BRD6e42hJm2dZrt+YY+HHA_at_mail.gmail.com>



Thank you Lok. Actually the join is not exactly on the columns but with some functions so that might be the case why it seems like Cartesian.

However the point is , this query has not been changed and I agree that it may be because the data has been increased a lil in one of the table 'C' or 'T' , so the join output has been increased from 7billion to 10billion but how come that justifies the temp space requirement from 20gb to 2.5terabyte it not twice or thrice but multiple times.

As this query is running in parallel-16 degree because of a underlying table degree , and considering no other option at hand, so I was thinking if by increasing or decreasing the parallelism with help of hints will help in completing the query at least with lesser temp space(we have 2.5TB of temp space with us right now currently) ? Or any other possible option to make this query succeed?

Just to note , we had initially 200gb of temp space but just with the hope that this query may finish, we already increased it to 2.5TB now, but still no luck and the query is failing and it's happening after this 19.15 patch.

On Sun, 13 Nov, 2022, 6:06 pm Lok P, <loknath.73_at_gmail.com> wrote:

> The below section you posted shows the input to 'Hash join outer' were
> ~1Million and 37milion respectively, but the result came from the outer
> join is ~7billion. So is there any oddity in the data in table 'T' and 'C'
> which is playing a role which is increasing the rows input to your
> analytical function recently?
>
>
> SQL Plan Monitoring Details (Plan Hash Value=2056239125)
>
> =================================================================================================================================================================================================================================================================================
> | Id | Operation
> | Name | Rows | Cost | Time | Start |
> Execs | Rows | Read | Read | Write | Write | Cell | Mem | Temp
> | Activity | Activity Detail |
> | |
> | | (Estim) | | Active(s) | Active |
> | (Actual) | Reqs | Bytes | Reqs | Bytes | Offload | (Max) | (Max) |
> (%) | (# samples) |
>
> =================================================================================================================================================================================================================================================================================
> |
> | | | | | |
> | | | | | | | | |
> | direct path read temp (553) |
> | |
> | | | | | |
> | | | | | | | | |
> | direct path write temp (47) |
> | 31 | HASH JOIN RIGHT OUTER
> | | 9M | 2M | 1676 | +218 |
> 16 | 7G | | | | | | 217M | |
> 23.08 | Cpu (4041) |
> | 32 | BUFFER SORT
> | | | | 4 | +216 |
> 16 | 1M | | | | | | 97M | |
> | |
> | 33 | PX RECEIVE
> | | 10647 | 24 | 4 | +216 |
> 16 | 1M | | | | | | | |
> | |
> | 34 | PX SEND HASH
> | :TQ10005 | 10647 | 24 | 1633 | +216 |
> 1 | 1M | | | | | | | |
> 0.01 | Cpu (2) |
> | 35 | VIEW
> | | 10647 | 24 | 1632 | +217 |
> 1 | 1M | | | | | | | |
> | |
> | 36 | TABLE ACCESS STORAGE FULL
> | SYS_TEMP_0FDA1E71E_D71F1BDE | 10647 | 24 | 1632 | +217 |
> 1 | 1M | 106 | 104MB | | | | | |
> 0.01 | cell multiblock physical read (1) |
> | 37 | PX RECEIVE
> | | 9M | 2M | 1676 | +218 |
> 16 | 37M | | | | | | | |
> 0.05 | Cpu (9) |
> | 38 | PX SEND HASH
> | :TQ10009 | 9M | 2M | 1677 | +217 |
> 16 | 37M | | | | | | | |
> 0.13 | Cpu (23) |
> | 39 | VIEW
> | | 9M | 2M | 1677 | +217 |
> 16 | 37M | | | | | | | |
> | |
>
> On Sun, Nov 13, 2022 at 2:34 AM yudhi s <learnerdatabase99_at_gmail.com>
> wrote:
>
>> Thank you Jonathan.
>> Actually the original query was a long one so i was trying to reduce the
>> complexity by just posting the particular section of the query which was
>> causing the temp space spike. But I agree that just a partial query doesn't
>> make much sense. Also my apology as the plan format was distorted.
>>
>> I have again posted the exact query below with the actual object names
>> being replaced with dummy names. I have posted the plan in the github in
>> the link below, so the sql execution plan format will be intact. And I am
>> not having the "display cursor" plan for the current runs/failures for this
>> INSERT query, so i have posted the sql monitors for them and i had the
>> display cursor plan when it was running fine in 11.2.0.4 so i posted that
>> just for reference. Also I have copied and pasted the outline section from
>> the display_awr below each of the sqlmonitor plan.
>>
>> https://gist.github.com/databasetech0073/714263bce477f624763f757e457cb861
>>
>> As mentioned earlier, the section of the plan which is doing the outer
>> join is below and it results in ~10billion rows. Which then is passed to
>> the analytical function. Was trying to understand, Is there any possible
>> way to minimize the temp space consumption in this scenario? because we
>> kept on increasing the temp space assuming it would succeed somehow, but
>> it just kept consuming 2.5TB+ space and then also it failed with "Ora-01652
>> unable to extend temp segment .." as it saturates all allocated temp
>> space?
>> As far as I know the encryption/TDE was already there in this database
>> before we moved from 19.11 to 19.15. I will double check with the infra
>> team on this.
>>
>> "LEFT OUTER JOIN C
>> ON SUBSTR (T.C_NBR, 1, 4) = C.BNK_ID_NB_4
>> AND LENGTH (T.C_NBR) = C.PANL_CNT
>> AND T.C_NBR BETWEEN C.LOW_VALUE AND C.HIGH_VALUE) S"
>>
>>
>>
>>
>>
>>
>> On Sat, Nov 12, 2022 at 12:50 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
>> wrote:
>>
>>> >> We reran the query and it took a different plan hash value , however
>>> the path is almost similar wrt the line which does "Outer join" and the
>>> "analytical function" evaluation. I have posted the specific section of sql
>>> monitor below with predicate and column projection for that run.
>>>
>>> The supplied outputs are virtually unreadable, for different execution
>>> plans, for different versions (and youve changed from 19.11 for the "slow"
>>> query to 11.2.0.4!) , and partial. I'm not going to look at them.
>>>
>>> >> However, the sql monitor shows status as 'error' out after
>>> sometime(~30minutes)) but the underlying sessions(~33 parallel sessions)
>>> kept on running for 2hrs+
>>>
>>> That looks like the behaviour that Mohamed Houri has written about in
>>> the past: https://hourim.wordpress.com/2020/03/07/sql-monitor-flaw/
>>>
>>> I'd forgotten it would be harder to collect the correct statistics from
>>> all the processes when running a parallel query - finding sessions by
>>> SQL_ID could work, but using views like v$px_sesstat etc. is safer.
>>> Nothing stands out from the information you supplied about wait events, and
>>> the only "unusual" thing in the session activity is that you are writing
>>> encrypted blocks to the temporary tablespace - were you doing that in the
>>> earlier version? It shouldn't explain a massive increase in space
>>> requirements, but it might be relevant - it would help to explain an
>>> increase in CPU.
>>>
>>>
>>> Regards
>>> Jonathan Lewis
>>>
>>>
>>> On Fri, 11 Nov 2022 at 17:07, yudhi s <learnerdatabase99_at_gmail.com>
>>> wrote:
>>>
>>>> Thank You Sayan, Jonathan.
>>>>
>>>> We reran the query and it took a different plan hash value , however
>>>> the path is almost similar wrt the line which does "Outer join" and the
>>>> "analytical function" evaluation. I have posted the specific section of sql
>>>> monitor below with predicate and column projection for that run. However,
>>>> the sql monitor shows status as 'error' out after sometime(~30minutes))
>>>> but the underlying sessions(~33 parallel sessions) kept on running for
>>>> 2hrs+, until then I saw ~2TB of temp space already consumed and later it
>>>> failed with insufficient temp space error.
>>>>
>>>>
>>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Nov 13 2022 - 15:27:57 CET

Original text of this message