Re: high temp space usage for same analytic function on latest version
Date: Mon, 14 Nov 2022 10:52:01 +0000
Message-ID: <CAGtsp8mL0izAyXdocmP_5PMNfdqZXWbitpsjOczyOmRsAAzu6A_at_mail.gmail.com>
Still avoiding spending a lot of time on the github text.
Regards
On Sun, 13 Nov 2022 at 14:28, yudhi s <learnerdatabase99_at_gmail.com> wrote:
> Thank you Lok. Actually the join is not exactly on the columns but with
Jonathan Lewis
> some functions so that might be the case why it seems like Cartesian.
>
> However the point is , this query has not been changed and I agree that it
> may be because the data has been increased a lil in one of the table 'C' or
> 'T' , so the join output has been increased from 7billion to 10billion but
> how come that justifies the temp space requirement from 20gb to 2.5terabyte
> it not twice or thrice but multiple times.
>
> As this query is running in parallel-16 degree because of a underlying
> table degree , and considering no other option at hand, so I was thinking
> if by increasing or decreasing the parallelism with help of hints will help
> in completing the query at least with lesser temp space(we have 2.5TB of
> temp space with us right now currently) ? Or any other possible option to
> make this query succeed?
>
> Just to note , we had initially 200gb of temp space but just with the hope
> that this query may finish, we already increased it to 2.5TB now, but still
> no luck and the query is failing and it's happening after this 19.15 patch.
>
>
> On Sun, 13 Nov, 2022, 6:06 pm Lok P, <loknath.73_at_gmail.com> wrote:
>
>> The below section you posted shows the input to 'Hash join outer' were
>> ~1Million and 37milion respectively, but the result came from the outer
>> join is ~7billion. So is there any oddity in the data in table 'T' and 'C'
>> which is playing a role which is increasing the rows input to your
>> analytical function recently?
>>
>>
>> SQL Plan Monitoring Details (Plan Hash Value=2056239125)
>>
>> =================================================================================================================================================================================================================================================================================
>> | Id | Operation
>> | Name | Rows | Cost | Time | Start |
>> Execs | Rows | Read | Read | Write | Write | Cell | Mem | Temp
>> | Activity | Activity Detail |
>> | |
>> | | (Estim) | | Active(s) | Active |
>> | (Actual) | Reqs | Bytes | Reqs | Bytes | Offload | (Max) | (Max) |
>> (%) | (# samples) |
>>
>> =================================================================================================================================================================================================================================================================================
>> |
>> | | | | | |
>> | | | | | | | | |
>> | direct path read temp (553) |
>> | |
>> | | | | | |
>> | | | | | | | | |
>> | direct path write temp (47) |
>> | 31 | HASH JOIN RIGHT OUTER
>> | | 9M | 2M | 1676 | +218 |
>> 16 | 7G | | | | | | 217M | |
>> 23.08 | Cpu (4041) |
>> | 32 | BUFFER SORT
>> | | | | 4 | +216 |
>> 16 | 1M | | | | | | 97M | |
>> | |
>> | 33 | PX RECEIVE
>> | | 10647 | 24 | 4 | +216 |
>> 16 | 1M | | | | | | | |
>> | |
>> | 34 | PX SEND HASH
>> | :TQ10005 | 10647 | 24 | 1633 | +216 |
>> 1 | 1M | | | | | | | |
>> 0.01 | Cpu (2) |
>> | 35 | VIEW
>> | | 10647 | 24 | 1632 | +217 |
>> 1 | 1M | | | | | | | |
>> | |
>> | 36 | TABLE ACCESS STORAGE FULL
>> | SYS_TEMP_0FDA1E71E_D71F1BDE | 10647 | 24 | 1632 | +217 |
>> 1 | 1M | 106 | 104MB | | | | | |
>> 0.01 | cell multiblock physical read (1) |
>> | 37 | PX RECEIVE
>> | | 9M | 2M | 1676 | +218 |
>> 16 | 37M | | | | | | | |
>> 0.05 | Cpu (9) |
>> | 38 | PX SEND HASH
>> | :TQ10009 | 9M | 2M | 1677 | +217 |
>> 16 | 37M | | | | | | | |
>> 0.13 | Cpu (23) |
>> | 39 | VIEW
>> | | 9M | 2M | 1677 | +217 |
>> 16 | 37M | | | | | | | |
>> | |
>>
>> On Sun, Nov 13, 2022 at 2:34 AM yudhi s <learnerdatabase99_at_gmail.com>
>> wrote:
>>
>>> Thank you Jonathan.
>>> Actually the original query was a long one so i was trying to reduce the
>>> complexity by just posting the particular section of the query which was
>>> causing the temp space spike. But I agree that just a partial query doesn't
>>> make much sense. Also my apology as the plan format was distorted.
>>>
>>> I have again posted the exact query below with the actual object names
>>> being replaced with dummy names. I have posted the plan in the github in
>>> the link below, so the sql execution plan format will be intact. And I am
>>> not having the "display cursor" plan for the current runs/failures for this
>>> INSERT query, so i have posted the sql monitors for them and i had the
>>> display cursor plan when it was running fine in 11.2.0.4 so i posted that
>>> just for reference. Also I have copied and pasted the outline section from
>>> the display_awr below each of the sqlmonitor plan.
>>>
>>> https://gist.github.com/databasetech0073/714263bce477f624763f757e457cb861
>>>
>>> As mentioned earlier, the section of the plan which is doing the outer
>>> join is below and it results in ~10billion rows. Which then is passed to
>>> the analytical function. Was trying to understand, Is there any possible
>>> way to minimize the temp space consumption in this scenario? because we
>>> kept on increasing the temp space assuming it would succeed somehow, but
>>> it just kept consuming 2.5TB+ space and then also it failed with "Ora-01652
>>> unable to extend temp segment .." as it saturates all allocated temp
>>> space?
>>> As far as I know the encryption/TDE was already there in this database
>>> before we moved from 19.11 to 19.15. I will double check with the infra
>>> team on this.
>>>
>>> "LEFT OUTER JOIN C
>>> ON SUBSTR (T.C_NBR, 1, 4) = C.BNK_ID_NB_4
>>> AND LENGTH (T.C_NBR) = C.PANL_CNT
>>> AND T.C_NBR BETWEEN C.LOW_VALUE AND C.HIGH_VALUE) S"
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Nov 12, 2022 at 12:50 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
>>> wrote:
>>>
>>>> >> We reran the query and it took a different plan hash value ,
>>>> however the path is almost similar wrt the line which does "Outer join" and
>>>> the "analytical function" evaluation. I have posted the specific section of
>>>> sql monitor below with predicate and column projection for that run.
>>>>
>>>> The supplied outputs are virtually unreadable, for different execution
>>>> plans, for different versions (and youve changed from 19.11 for the "slow"
>>>> query to 11.2.0.4!) , and partial. I'm not going to look at them.
>>>>
>>>> >> However, the sql monitor shows status as 'error' out after
>>>> sometime(~30minutes)) but the underlying sessions(~33 parallel sessions)
>>>> kept on running for 2hrs+
>>>>
>>>> That looks like the behaviour that Mohamed Houri has written about in
>>>> the past: https://hourim.wordpress.com/2020/03/07/sql-monitor-flaw/
>>>>
>>>> I'd forgotten it would be harder to collect the correct statistics from
>>>> all the processes when running a parallel query - finding sessions by
>>>> SQL_ID could work, but using views like v$px_sesstat etc. is safer.
>>>> Nothing stands out from the information you supplied about wait events, and
>>>> the only "unusual" thing in the session activity is that you are writing
>>>> encrypted blocks to the temporary tablespace - were you doing that in the
>>>> earlier version? It shouldn't explain a massive increase in space
>>>> requirements, but it might be relevant - it would help to explain an
>>>> increase in CPU.
>>>>
>>>>
>>>> Regards
>>>> Jonathan Lewis
>>>>
>>>>
>>>> On Fri, 11 Nov 2022 at 17:07, yudhi s <learnerdatabase99_at_gmail.com>
>>>> wrote:
>>>>
>>>>> Thank You Sayan, Jonathan.
>>>>>
>>>>> We reran the query and it took a different plan hash value , however
>>>>> the path is almost similar wrt the line which does "Outer join" and the
>>>>> "analytical function" evaluation. I have posted the specific section of sql
>>>>> monitor below with predicate and column projection for that run. However,
>>>>> the sql monitor shows status as 'error' out after sometime(~30minutes))
>>>>> but the underlying sessions(~33 parallel sessions) kept on running for
>>>>> 2hrs+, until then I saw ~2TB of temp space already consumed and later it
>>>>> failed with insufficient temp space error.
>>>>>
>>>>>
>>>>
>>>
-- http://www.freelists.org/webpage/oracle-lReceived on Mon Nov 14 2022 - 11:52:01 CET