Re: high temp space usage for same analytic function on latest version

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Fri, 18 Nov 2022 09:38:56 +0000
Message-ID: <CAGtsp8n2on0cQgVnR-kHZaEJyG9TMdHNNSzmGUp9X4V6OwmENQ_at_mail.gmail.com>

I couldn't leave it alone, so I've just come back to my simple 11g model because I realised I had to do two more little tests. 1) run it on a newer version of Oracle
2) create the table with the data sorted to suit the analytic function.

Results of 1: The 21c execution was exactly the same, but run 4 times as fast - directly contradicting your 19c result Results of 2: 11g ran much faster for the pre-sorted data and spilled just 331 blocks to disc rather than 6,826 Results of 2a: 21c completed the task in memory using 5M of memory for an initial 55MB of data to be sorted.

Conclusion - Oracle does do the "early throwaway" on the analytic sort; but the incoming data for your 19c test is sufficiently different that the optimisation can't be applied very much. This might be bad luck because the volume is nearly 50% larger, it might be bad luck because of a different order that the data arrives at the operation from the hash join (which might be a change in the algorithm for the parallel hash join or hash/hash distribution that Oracle has used.

Bottom line - there is no bug, there is only bad luck.

By the way, rewriting the code is the correct option; you don't really want to sort 10 billion rows down to 100 million if you can get rid of most of those rows first; and if you can't reduce the number of rows perhaps you can rearrange the joins to reduce the length of the rows and pick up the rest of the required columns later.

Here's a quote from a very old blog note I wrote about a bug (fixed by 11.2.0.4):

" Since then I’ve always warned people to be a little careful about how they use analytic functions because of the amount of sorting they can introduce. My suggestion has always been to crunch *“large”* volumes of data down to *“small”* volumes of data before applying any analytic functions to add *“intelligence”* to the intermediate result."

The article might still be worth reading because it talks about how I investigated what the analytic sort was doing

Regards
Jonathan Lewis

On Fri, 18 Nov 2022 at 08:08, Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:

>
> I can't model the problem at present - partly because my 19c VM keeps
> crashing, but mostly because I don't have the time.
>
> I've done a very simple model on 11.2.0.4 and my hypothetical optimisation
> (early throwaway) didn't seem to be happening. It occured to me, however,
> that the amount of benefit of the mechanism would depend on the pattern of
> the data and the number of final distinct values (i.e. the output size).
> It's also occured to me that the Analytic sort uses a V1 sort and there is
> (used to be) a hard limit of 100MB for the "insertion sort memory" set by a
> hidden parameter see
> https://jonathanlewis.wordpress.com/2009/10/01/_smm_isort_cap/ - it might
> not take a very large increase in the inputs to over-use that space. Bear
> in mind the insertion sort builds a binary tree and the overhead (in
> memory) for the TREE is 24 bytes per row sorted.
>
> Three things to try
> a) set events 10032 and 10033 to see the Sort operation information, in
> particular how big the runs to disc are.
> b) see if setting the hidden parameter to (say) 200M makes the problem go
> away.
> c) check for hints and (hidden) parameters that look as if they might
> allow you to block "push down" and see if they have any effect.
>
> Regards
> Jonathan Lewis
>
>
>
>
> On Fri, 18 Nov 2022 at 05:10, yudhi s <learnerdatabase99_at_gmail.com> wrote:
>
>> Thank you so much Jonathan for the detailed insight.
>>
>> I just did a quick check. When you said about the predicate section used
>> during 'run time' i believe it's the same as the predicate section shown in
>> display_cursor plan itself which i already captured. So they are as below
>> and the section of the query generating those is as below. I am not seeing
>> any changes in them in both the versions. And also as it seems the
>> 'row_number' analytical function is playing a role, i was thinking if we
>> should try some alternate possible way to modify the query and get the same
>> results?
>>
>> And yes, the problematic part is the analytical function evaluation
>> ("WINDOW CHILD PUSHED RANK") for those billions of rows as input which
>> comes from the "outer hash join". And as you rightly said, it might be that
>> it was evaluating the window function on the go in 11.2 as "PUSHED RANK"
>> was helping here so was taking lot lesser temp as was aggregating/squeezing
>> the input data on the go , but in 19.15 its trying to get all the rows
>> first captured/sorted and then apply the window function on top of all the
>> rows at once.
>>
>> We will double check on the patch related to that bug which you
>> highlighted, because this issue popped up post 19.15 patch. It was working
>> fine till 19.11. And also as the behaviour is not getting influenced by the
>> OFE 11.2.0.4 hint (we tried manually with the hint but no success), so it
>> must be something internal to 19.15 I hope.
>>
>> *11.2 predicate from display_cursor:-*
>> 21 - filter(ROW_NUMBER() OVER ( PARTITION BY "T"."DS_NO","T"."C_NBR"
>> ORDER BY INTERNAL_FUNCTION("C"."CPCL_NBR") DESC )<=1)
>> 24 - filter(ROW_NUMBER() OVER ( PARTITION BY "T"."DS_NO","T"."C_NBR"
>> ORDER BY INTERNAL_FUNCTION("C"."CPCL_NBR") DESC )<=1)
>>
>> *19.15 predicate from display_cursor:-*
>> 24 - filter(ROW_NUMBER() OVER ( PARTITION BY "T"."DS_NO","T"."C_NBR"
>> ORDER BY INTERNAL_FUNCTION("C"."CPCL_NBR") DESC )<=1)
>> 27 - filter(ROW_NUMBER() OVER ( PARTITION BY "T"."DS_NO","T"."C_NBR"
>> ORDER BY INTERNAL_FUNCTION("C"."CPCL_NBR") DESC )<=1)
>>
>>
>> *Analytical function section from query:-*ROW_NUMBER () OVER (PARTITION
>> BY T.DS_NO, T.C_NBR ORDER BY C.CPCL_NBR DESC) AS CP_RANK,
>> COUNT (*) OVER ( PARTITION BY T.DS_NO, T.C_NBR,C.CPCL_NBR) AS
>> CL_MATCHES,
>> COUNT (*) OVER (PARTITION BY T.DS_NO, T.C_NBR) AS NO_MATCHES
>>
>>
>> On Fri, Nov 18, 2022 at 3:52 AM Jonathan Lewis <jlewisoracle_at_gmail.com>
>> wrote:
>>
>>>
>>> I thought I'd take a look on MOS for bugs that report performance
>>> problems related to "pushed rank" and found one.
>>> Bug 30786641 : QUERY PERFORMANCE REGRESSION AFTER UPGRADING FROM
>>> 11.2.0.4 TO 19.5
>>> It might be related; but maybe not, and you may have the patch built
>>> into your version anyway - the patch is in19.15.2.0
>>>
>>> Regards
>>> Jonathan Lewis
>>>
>>>
>>>
>>> On Thu, 17 Nov 2022 at 20:30, Jonathan Lewis <jlewisoracle_at_gmail.com>
>>> wrote:
>>>
>>>>
>>>> I've now examined the three different Monitor report with a little care.
>>>> THere's more information I would like to collect if I were on site and
>>>> able to re-run the queries, but I think I may have identified the critical
>>>> issue.
>>>>
>>>> I thought you kept saying the problem was in the outer hash join -
>>>> maybe you didn't and I just picked that up from the bits and pieces.
>>>>
>>>> The massive I/O overhead (temp space usage) is in the WINDOW CHILD
>>>> PUSHED RANK operation (the one above the outer join) -It's a problem
>>>> whether or not you use the profile from 11g. Critical observation: the
>>>> input to the window sort is 7 billion rows in 11g, the sort space required
>>>> is 10GB, with 19GB of reads and writes (so a multipass sort operation).
>>>> How do you sort 7 B rows with only 11GB of space ? Answer: you can't.
>>>>
>>>> In 19c the number of rows is 7B in one case and 10B in the other, and
>>>> the space requirement (before failing) is a much more realistic 1TB. The
>>>> inference we draw from this is that the PUSHED RANK is working in 11g, but
>>>> not working correctly in 19c, so the entire data set has to be captured
>>>> before the SORT in 19c but 11g can keep saying "I've got one of those
>>>> already, I only need to keep one" as each row arrives. The output that
>>>> succeeds in roughly 1 row in 100, so it's not too surprising that the temp
>>>> space used if pushed rank doesn't work is roughly 100 times the size.)
>>>>
>>>> The next really important detail to check is the actual runtime
>>>> predicate used in the pushed rank, just in case it's not what is claimed.
>>>> Another really important check is that 11g has been getting the right
>>>> results - maybe the mechanism it has been using was disabled in 19c because
>>>> it could produce the wrong results.
>>>>
>>>> Regards
>>>> Jonathan Lewis
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, 16 Nov 2022 at 18:44, yudhi s <learnerdatabase99_at_gmail.com>
>>>> wrote:
>>>>
>>>>> Yes I tried that and it does increase the estimates for the cte from
>>>>> 38k to 860k but that problematic section of the plan remains same consuming
>>>>> terabytes of temp space.
>>>>>
>>>>> On Thu, 17 Nov, 2022, 12:10 am Lok P, <loknath.73_at_gmail.com> wrote:
>>>>>
>>>>>> Did you try forcing the opt_estimate hint to influence the cte
>>>>>> estimation to increase 18 times as Jonathan suggested? Does that change
>>>>>> anything in the execution path?
>>>>>>
>>>>>> On Wed, 16 Nov, 2022, 11:24 pm yudhi s, <learnerdatabase99_at_gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Extremely sorry, as it looks like I have messed it up. Trying to
>>>>>>> correct it.
>>>>>>>
>>>>>>> Basically we had a sql profile attached in past while we were in
>>>>>>> version 11.2.0.4 and was working fine. also when we moved to 19.11 too that
>>>>>>> same profile was there making the query finish without any issue. but that
>>>>>>> same plan is now making the query to go for consuming terabytes of temp on
>>>>>>> version 19.15. I dont have a sql monitor from 19.11 plan which was
>>>>>>> succeeding fine, but its the same plan phv as the 19.15 plan which i posted
>>>>>>> below in the 2nd sql monitor.
>>>>>>>
>>>>>>> However, the point i want to make here is , all of these plans are
>>>>>>> almost same with minor difference. And If you see the specific section of
>>>>>>> the plan which does the "HASH JOIN OUTER" and passing the resulting output
>>>>>>> to the analytical function is exactly same in all these cases but still the
>>>>>>> 19.15 version is going for ~1TB+ temp. So its definitely odd.
>>>>>>>
>>>>>>> Below is the three comparison of that exact section of the plan
>>>>>>> which is playing key role here in consuming all tempspace. In the first
>>>>>>> case even if the ~7Billion rows in getting sorted using just ~10GB of
>>>>>>> tempspace, whereas in second and third cases its using in TBs. Though the
>>>>>>> data volume is higher but this excessive use of tempspace is not justified.
>>>>>>> Definitely seems buggy.
>>>>>>>
>>>>>>>
>>>>>>> https://gist.github.com/databasetech0073/55c5722da8105dfcc647ab3bd1b24a48
>>>>>>>
>>>>>>> And also as you have highlighted the bug related to bloom filter i
>>>>>>> tried forcing the (no_px_join_filter hints and even setting up
>>>>>>> "_bloom_filter_enabled" and "_bloom_pruning_enabled" in session level to
>>>>>>> stop the bloom pruning and tried running SELECT part of the query, but
>>>>>>> still its using same excessive amount of tempspace.
>>>>>>>
>>>>>>>
>>>>>>> --1st sqlmonitor
>>>>>>>
>>>>>>> Below is the sql monitor and outline in version 11.2.0.4, which was
>>>>>>> finishing by consuming <20GB tempspace.
>>>>>>>
>>>>>>>
>>>>>>> https://gist.github.com/databasetech0073/630dad9fc43d6ab6c8863251e29f1cf2
>>>>>>>
>>>>>>> --2nd sql monitor
>>>>>>>
>>>>>>> Below is the sql monitor and outline in version 19.15 (by forcing
>>>>>>> exactly the same plan through a sql profile as it used to take in 11.2).
>>>>>>>
>>>>>>>
>>>>>>> https://gist.github.com/databasetech0073/37ec33e73a23bd0055148ba43a0a49ca
>>>>>>>
>>>>>>> --3rd sql monitor
>>>>>>>
>>>>>>> Below is the sql monitor and outline in current version ( i.e 19.15
>>>>>>> without any profile) which is also consuming high tempspace.
>>>>>>>
>>>>>>>
>>>>>>> https://gist.github.com/databasetech0073/4b895f5790a993ed326d389b54295f9b
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 15 Nov, 2022, 6:39 pm Jonathan Lewis, <
>>>>>>> jlewisoracle_at_gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Since I've noticed Bloom filters as an anomaly, it might be worth
>>>>>>>> checking if you're patched for:
>>>>>>>> "Bug 30618406 - Huge PGA memory consumption by bloom filters
>>>>>>>> (qesblfilter_seg) in px slaves in full partition-wise join"
>>>>>>>>
>>>>>>>> I know this plan isn't using full PW joins - but (read the details)
>>>>>>>> - your parallel operation uses a lot of memory, and then a lot of temp
>>>>>>>> space. Sometimes the bug goes a lot further than the initial description.
>>>>>>>>
>>>>>>>> Marked as fixed in 23.1, but I can see patches for 19.16 in the RU.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Jonathan Lewis
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 15 Nov 2022 at 07:15, Jonathan Lewis <
>>>>>>>> jlewisoracle_at_gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I've taken a few minutes to look through the github material. The
>>>>>>>>> headings don't agree with the contents. Here's a list showing each heading
>>>>>>>>> plus two lines from the Outline Data under the heading:
>>>>>>>>>
>>>>>>>>> Display cursor plan from version 11.2.0.4
>>>>>>>>> OPTIMIZER_FEATURES_ENABLE('19.1.0')
>>>>>>>>> DB_VERSION('19.1.0')
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Execution which is failing in 19.15 version with high tempspace
>>>>>>>>> consumption
>>>>>>>>> OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
>>>>>>>>> DB_VERSION('19.1.0')
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Execution which was running fine in 19.11 version
>>>>>>>>> OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
>>>>>>>>> DB_VERSION('11.2.0.4')
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> There doesn't seem to be any point in spending time looking at
>>>>>>>>> detail when I've got no confidence that the bits of information supplied
>>>>>>>>> belong together.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> A couple of high visibility points, though:
>>>>>>>>> a) the version labelled "Execution which was running fine in 19.11
>>>>>>>>> version" reported OPT_PARAM('optimizer_dynamic_sampling' 3) in its outline
>>>>>>>>> data, and that could have helped Oracle get better estimates on some of the
>>>>>>>>> predicates that included functions of columns.
>>>>>>>>> b) the version labelled "Display cursor plan from version
>>>>>>>>> 11.2.0.4" included Bloom Filtering sent to storage on the massive tablescan
>>>>>>>>> where I pointed out the massive CPU anomaly in an earlier post.
>>>>>>>>> 76 - storage((COALESCE("FPT"."DEL_FLG",'N')='N' AND
>>>>>>>>> TRUNC(INTERNAL_FUNCTION("FPT"."H_DT_TIM"))=:B1 AND
>>>>>>>>> SYS_OP_BLOOM_FILTER(:BF0000,"FPT"."A_FK")))
>>>>>>>>> filter((COALESCE("FPT"."DEL_FLG",'N')='N' AND
>>>>>>>>> TRUNC(INTERNAL_FUNCTION("FPT"."H_DT_TIM"))=:B1 AND
>>>>>>>>> SYS_OP_BLOOM_FILTER(:BF0000,"FPT"."A_FK")))
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Jonathan Lewis
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, 14 Nov 2022 at 17:18, yudhi s <learnerdatabase99_at_gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thank You Sayan and Jonathan.
>>>>>>>>>>
>>>>>>>>>> As you suggested, as a quick thing , in the select query, I was
>>>>>>>>>> trying this hint in the main block to influence optimizer to increase the
>>>>>>>>>> resulting rows ~18 times from the CTE "C" , but its not getting reflected
>>>>>>>>>> in the plan, which means optimizer is not considering this, but then it
>>>>>>>>>> should appear in the hint report section of the plan as error or unused
>>>>>>>>>> etc.. but it is not getting published there too, so not sure why is this
>>>>>>>>>> happening. Is there anything wrong in the hint below which i am using?
>>>>>>>>>>
>>>>>>>>>> I am using below hint in the main select query which performs
>>>>>>>>>> outer join with the "cte" Table C
>>>>>>>>>>
>>>>>>>>>> /*+OPT_ESTIMATE( TABLE, C, scale_rows=18)*/
>>>>>>>>>>
>>>>>>>>>> Also , I tried as below but seeing no change to the tempspace
>>>>>>>>>> consumption though.
>>>>>>>>>>
>>>>>>>>>> cast(RPAD (SUBSTR (CP.CP_NBR, 1, CP.CPCL_NBR - X.RN),CASE
>>>>>>>>>> CP.PANL_CNT WHEN 0 THEN 16 ELSE CP.PANL_CNT END, '9') as
>>>>>>>>>> varchar2(22)) AS HIGH_VALUE
>>>>>>>>>>
>>>>>>>>>> cast(RPAD (SUBSTR (CP.CP_NBR, 1, CP.CPCL_NBR - X.RN), CASE
>>>>>>>>>> CP.PANL_CNT WHEN 0 THEN 16 ELSE CP.PANL_CNT END, '0') as varchar2(22)
>>>>>>>>>> ) AS LOW_VALUE,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, 14 Nov, 2022, 4:22 pm Jonathan Lewis, <
>>>>>>>>>> jlewisoracle_at_gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Still avoiding spending a lot of time on the github text.
>>>>>>>>>>>
>>>>>>>>>>> a) You materialize a CTE that invovles a connect by query that
>>>>>>>>>>> generates 18 rows - use the opt_estimate() hint to tell Oracle that the
>>>>>>>>>>> materialized view holds 18 times the rows it expects. (If you examine the
>>>>>>>>>>> plan the optimizer allows for ONE row coming from dual, not 18).
>>>>>>>>>>>
>>>>>>>>>>> b) You have several places where you use expresssions like
>>>>>>>>>>> substr(c_nbr,1,4) in predicates . Create (optionally invisible) virtual
>>>>>>>>>>> columns for these expressions and gather stats on them, this will give the
>>>>>>>>>>> optimizer a better chance of getting the right arithmetic, especially for
>>>>>>>>>>> cases where the expression needs a frequency (or top-frequency) histogram.
>>>>>>>>>>> Invisible is safest for existing production code, but you may need to test
>>>>>>>>>>> carefully to see if the optimizer manages to use the stats from the virtual
>>>>>>>>>>> columns when it sees the equivalent expressions.
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Jonathan Lewis
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sun, 13 Nov 2022 at 14:28, yudhi s <
>>>>>>>>>>> learnerdatabase99_at_gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thank you Lok. Actually the join is not exactly on the columns
>>>>>>>>>>>> but with some functions so that might be the case why it seems like
>>>>>>>>>>>> Cartesian.
>>>>>>>>>>>>
>>>>>>>>>>>> However the point is , this query has not been changed and I
>>>>>>>>>>>> agree that it may be because the data has been increased a lil in one of
>>>>>>>>>>>> the table 'C' or 'T' , so the join output has been increased from 7billion
>>>>>>>>>>>> to 10billion but how come that justifies the temp space requirement from
>>>>>>>>>>>> 20gb to 2.5terabyte it not twice or thrice but multiple times.
>>>>>>>>>>>>
>>>>>>>>>>>> As this query is running in parallel-16 degree because of a
>>>>>>>>>>>> underlying table degree , and considering no other option at hand, so I was
>>>>>>>>>>>> thinking if by increasing or decreasing the parallelism with help of hints
>>>>>>>>>>>> will help in completing the query at least with lesser temp space(we have
>>>>>>>>>>>> 2.5TB of temp space with us right now currently) ? Or any other possible
>>>>>>>>>>>> option to make this query succeed?
>>>>>>>>>>>>
>>>>>>>>>>>> Just to note , we had initially 200gb of temp space but just
>>>>>>>>>>>> with the hope that this query may finish, we already increased it to 2.5TB
>>>>>>>>>>>> now, but still no luck and the query is failing and it's happening after
>>>>>>>>>>>> this 19.15 patch.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, 13 Nov, 2022, 6:06 pm Lok P, <loknath.73_at_gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The below section you posted shows the input to 'Hash join
>>>>>>>>>>>>> outer' were ~1Million and 37milion respectively, but the result came from
>>>>>>>>>>>>> the outer join is ~7billion. So is there any oddity in the data in
>>>>>>>>>>>>> table 'T' and 'C' which is playing a role which is increasing the rows
>>>>>>>>>>>>> input to your analytical function recently?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> SQL Plan Monitoring Details (Plan Hash Value=2056239125)
>>>>>>>>>>>>>
>>>>>>>>>>>>> =================================================================================================================================================================================================================================================================================
>>>>>>>>>>>>> | Id | Operation
>>>>>>>>>>>>> | Name | Rows | Cost | Time |
>>>>>>>>>>>>> Start | Execs | Rows | Read | Read | Write | Write | Cell | Mem
>>>>>>>>>>>>> | Temp | Activity | Activity Detail |
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> | | (Estim) | | Active(s) |
>>>>>>>>>>>>> Active | | (Actual) | Reqs | Bytes | Reqs | Bytes | Offload | (Max)
>>>>>>>>>>>>> | (Max) | (%) | (# samples) |
>>>>>>>>>>>>>
>>>>>>>>>>>>> =================================================================================================================================================================================================================================================================================
>>>>>>>>>>>>> |
>>>>>>>>>>>>> | | | | |
>>>>>>>>>>>>> | | | | | | | | |
>>>>>>>>>>>>> | | direct path read temp (553) |
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> | | | | |
>>>>>>>>>>>>> | | | | | | | | |
>>>>>>>>>>>>> | | direct path write temp (47) |
>>>>>>>>>>>>> | 31 | HASH JOIN RIGHT OUTER
>>>>>>>>>>>>> | | 9M | 2M | 1676 |
>>>>>>>>>>>>> +218 | 16 | 7G | | | | | | 217M |
>>>>>>>>>>>>> | 23.08 | Cpu (4041) |
>>>>>>>>>>>>> | 32 | BUFFER SORT
>>>>>>>>>>>>> | | | | 4 |
>>>>>>>>>>>>> +216 | 16 | 1M | | | | | | 97M
>>>>>>>>>>>>> | | | |
>>>>>>>>>>>>> | 33 | PX RECEIVE
>>>>>>>>>>>>> | | 10647 | 24 | 4 |
>>>>>>>>>>>>> +216 | 16 | 1M | | | | | |
>>>>>>>>>>>>> | | | |
>>>>>>>>>>>>> | 34 | PX SEND HASH
>>>>>>>>>>>>> | :TQ10005 | 10647 | 24 | 1633 |
>>>>>>>>>>>>> +216 | 1 | 1M | | | | | | |
>>>>>>>>>>>>> | 0.01 | Cpu (2) |
>>>>>>>>>>>>> | 35 | VIEW
>>>>>>>>>>>>> | | 10647 | 24 | 1632 |
>>>>>>>>>>>>> +217 | 1 | 1M | | | | | |
>>>>>>>>>>>>> | | | |
>>>>>>>>>>>>> | 36 | TABLE ACCESS STORAGE FULL
>>>>>>>>>>>>> | SYS_TEMP_0FDA1E71E_D71F1BDE | 10647 | 24 | 1632 |
>>>>>>>>>>>>> +217 | 1 | 1M | 106 | 104MB | | | |
>>>>>>>>>>>>> | | 0.01 | cell multiblock physical read (1) |
>>>>>>>>>>>>> | 37 | PX RECEIVE
>>>>>>>>>>>>> | | 9M | 2M | 1676 |
>>>>>>>>>>>>> +218 | 16 | 37M | | | | | | |
>>>>>>>>>>>>> | 0.05 | Cpu (9) |
>>>>>>>>>>>>> | 38 | PX SEND HASH
>>>>>>>>>>>>> | :TQ10009 | 9M | 2M | 1677 |
>>>>>>>>>>>>> +217 | 16 | 37M | | | | | |
>>>>>>>>>>>>> | | 0.13 | Cpu (23) |
>>>>>>>>>>>>> | 39 | VIEW
>>>>>>>>>>>>> | | 9M | 2M | 1677 |
>>>>>>>>>>>>> +217 | 16 | 37M | | | | | | |
>>>>>>>>>>>>> | | |
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Nov 13, 2022 at 2:34 AM yudhi s <
>>>>>>>>>>>>> learnerdatabase99_at_gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you Jonathan.
>>>>>>>>>>>>>> Actually the original query was a long one so i was trying to
>>>>>>>>>>>>>> reduce the complexity by just posting the particular section of the query
>>>>>>>>>>>>>> which was causing the temp space spike. But I agree that just a partial
>>>>>>>>>>>>>> query doesn't make much sense. Also my apology as the plan format was
>>>>>>>>>>>>>> distorted.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have again posted the exact query below with the actual
>>>>>>>>>>>>>> object names being replaced with dummy names. I have posted the plan in the
>>>>>>>>>>>>>> github in the link below, so the sql execution plan format will be intact.
>>>>>>>>>>>>>> And I am not having the "display cursor" plan for the current runs/failures
>>>>>>>>>>>>>> for this INSERT query, so i have posted the sql monitors for them and i had
>>>>>>>>>>>>>> the display cursor plan when it was running fine in 11.2.0.4 so i posted
>>>>>>>>>>>>>> that just for reference. Also I have copied and pasted the outline section
>>>>>>>>>>>>>> from the display_awr below each of the sqlmonitor plan.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://gist.github.com/databasetech0073/714263bce477f624763f757e457cb861
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As mentioned earlier, the section of the plan which is doing
>>>>>>>>>>>>>> the outer join is below and it results in ~10billion rows. Which then is
>>>>>>>>>>>>>> passed to the analytical function. Was trying to understand, Is there any
>>>>>>>>>>>>>> possible way to minimize the temp space consumption in this scenario?
>>>>>>>>>>>>>> because we kept on increasing the temp space assuming it would succeed
>>>>>>>>>>>>>> somehow, but it just kept consuming 2.5TB+ space and then also it failed
>>>>>>>>>>>>>> with "Ora-01652 unable to extend temp segment .." as it saturates all
>>>>>>>>>>>>>> allocated temp space?
>>>>>>>>>>>>>> As far as I know the encryption/TDE was already there in this
>>>>>>>>>>>>>> database before we moved from 19.11 to 19.15. I will double check with the
>>>>>>>>>>>>>> infra team on this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "LEFT OUTER JOIN C
>>>>>>>>>>>>>> ON SUBSTR (T.C_NBR, 1, 4) = C.BNK_ID_NB_4
>>>>>>>>>>>>>> AND LENGTH (T.C_NBR) = C.PANL_CNT
>>>>>>>>>>>>>> AND T.C_NBR BETWEEN C.LOW_VALUE AND
>>>>>>>>>>>>>> C.HIGH_VALUE) S"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Nov 12, 2022 at 12:50 PM Jonathan Lewis <
>>>>>>>>>>>>>> jlewisoracle_at_gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >> We reran the query and it took a different plan hash
>>>>>>>>>>>>>>> value , however the path is almost similar wrt the line which does "Outer
>>>>>>>>>>>>>>> join" and the "analytical function" evaluation. I have posted the specific
>>>>>>>>>>>>>>> section of sql monitor below with predicate and column projection for that
>>>>>>>>>>>>>>> run.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The supplied outputs are virtually unreadable, for different
>>>>>>>>>>>>>>> execution plans, for different versions (and youve changed from 19.11 for
>>>>>>>>>>>>>>> the "slow" query to 11.2.0.4!) , and partial. I'm not going to look at them.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >> However, the sql monitor shows status as 'error' out
>>>>>>>>>>>>>>> after sometime(~30minutes)) but the underlying sessions(~33 parallel
>>>>>>>>>>>>>>> sessions) kept on running for 2hrs+
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That looks like the behaviour that Mohamed Houri has written
>>>>>>>>>>>>>>> about in the past:
>>>>>>>>>>>>>>> https://hourim.wordpress.com/2020/03/07/sql-monitor-flaw/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd forgotten it would be harder to collect the correct
>>>>>>>>>>>>>>> statistics from all the processes when running a parallel query - finding
>>>>>>>>>>>>>>> sessions by SQL_ID could work, but using views like v$px_sesstat etc. is
>>>>>>>>>>>>>>> safer. Nothing stands out from the information you supplied about wait
>>>>>>>>>>>>>>> events, and the only "unusual" thing in the session activity is that you
>>>>>>>>>>>>>>> are writing encrypted blocks to the temporary tablespace - were you doing
>>>>>>>>>>>>>>> that in the earlier version? It shouldn't explain a massive increase in
>>>>>>>>>>>>>>> space requirements, but it might be relevant - it would help to explain an
>>>>>>>>>>>>>>> increase in CPU.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> Jonathan Lewis
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, 11 Nov 2022 at 17:07, yudhi s <
>>>>>>>>>>>>>>> learnerdatabase99_at_gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank You Sayan, Jonathan.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We reran the query and it took a different plan hash value
>>>>>>>>>>>>>>>> , however the path is almost similar wrt the line which does "Outer join"
>>>>>>>>>>>>>>>> and the "analytical function" evaluation. I have posted the specific
>>>>>>>>>>>>>>>> section of sql monitor below with predicate and column projection for that
>>>>>>>>>>>>>>>> run. However, the sql monitor shows status as 'error' out after
>>>>>>>>>>>>>>>> sometime(~30minutes)) but the underlying sessions(~33 parallel sessions)
>>>>>>>>>>>>>>>> kept on running for 2hrs+, until then I saw ~2TB of temp space already
>>>>>>>>>>>>>>>> consumed and later it failed with insufficient temp space error.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

--
http://www.freelists.org/webpage/oracle-l

Received on Fri Nov 18 2022 - 10:38:56 CET

This message: [ Message body ]
Next message: Jonathan Lewis: "Re: high temp space usage for same analytic function on latest version"
In reply to Jonathan Lewis: "Re: high temp space usage for same analytic function on latest version"
Next in thread: Jonathan Lewis: "Re: high temp space usage for same analytic function on latest version"
Reply: Jonathan Lewis: "Re: high temp space usage for same analytic function on latest version"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message