Re: high temp space usage for same analytic function on latest version
Date: Fri, 18 Nov 2022 09:38:56 +0000
Message-ID: <CAGtsp8n2on0cQgVnR-kHZaEJyG9TMdHNNSzmGUp9X4V6OwmENQ_at_mail.gmail.com>
I couldn't leave it alone, so I've just come back to my simple 11g model
because I realised I had to do two more little tests.
1) run it on a newer version of Oracle
2) create the table with the data sorted to suit the analytic function.
Results of 1: The 21c execution was exactly the same, but run 4 times as fast - directly contradicting your 19c result Results of 2: 11g ran much faster for the pre-sorted data and spilled just 331 blocks to disc rather than 6,826 Results of 2a: 21c completed the task in memory using 5M of memory for an initial 55MB of data to be sorted.
Conclusion - Oracle does do the "early throwaway" on the analytic sort; but the incoming data for your 19c test is sufficiently different that the optimisation can't be applied very much. This might be bad luck because the volume is nearly 50% larger, it might be bad luck because of a different order that the data arrives at the operation from the hash join (which might be a change in the algorithm for the parallel hash join or hash/hash distribution that Oracle has used.
Bottom line - there is no bug, there is only bad luck.
On Fri, 18 Nov 2022 at 08:08, Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:
>
> I can't model the problem at present - partly because my 19c VM keeps
> crashing, but mostly because I don't have the time.
>
> I've done a very simple model on 11.2.0.4 and my hypothetical optimisation
> (early throwaway) didn't seem to be happening. It occured to me, however,
> that the amount of benefit of the mechanism would depend on the pattern of
> the data and the number of final distinct values (i.e. the output size).
> It's also occured to me that the Analytic sort uses a V1 sort and there is
> (used to be) a hard limit of 100MB for the "insertion sort memory" set by a
> hidden parameter see
> https://jonathanlewis.wordpress.com/2009/10/01/_smm_isort_cap/ - it might
> not take a very large increase in the inputs to over-use that space. Bear
> in mind the insertion sort builds a binary tree and the overhead (in
> memory) for the TREE is 24 bytes per row sorted.
>
> Three things to try
> a) set events 10032 and 10033 to see the Sort operation information, in
> particular how big the runs to disc are.
> b) see if setting the hidden parameter to (say) 200M makes the problem go
> away.
> c) check for hints and (hidden) parameters that look as if they might
> allow you to block "push down" and see if they have any effect.
>
> Regards
> Jonathan Lewis
>
>
>
>
> On Fri, 18 Nov 2022 at 05:10, yudhi s <learnerdatabase99_at_gmail.com> wrote:
>
>> Thank you so much Jonathan for the detailed insight.
>>
>> I just did a quick check. When you said about the predicate section used
>> during 'run time' i believe it's the same as the predicate section shown in
>> display_cursor plan itself which i already captured. So they are as below
>> and the section of the query generating those is as below. I am not seeing
>> any changes in them in both the versions. And also as it seems the
>> 'row_number' analytical function is playing a role, i was thinking if we
>> should try some alternate possible way to modify the query and get the same
>> results?
>>
>> And yes, the problematic part is the analytical function evaluation
>> ("WINDOW CHILD PUSHED RANK") for those billions of rows as input which
>> comes from the "outer hash join". And as you rightly said, it might be that
>> it was evaluating the window function on the go in 11.2 as "PUSHED RANK"
>> was helping here so was taking lot lesser temp as was aggregating/squeezing
>> the input data on the go , but in 19.15 its trying to get all the rows
>> first captured/sorted and then apply the window function on top of all the
>> rows at once.
>>
>> We will double check on the patch related to that bug which you
>> highlighted, because this issue popped up post 19.15 patch. It was working
>> fine till 19.11. And also as the behaviour is not getting influenced by the
>> OFE 11.2.0.4 hint (we tried manually with the hint but no success), so it
>> must be something internal to 19.15 I hope.
>>
>> *11.2 predicate from display_cursor:-*
>> 21 - filter(ROW_NUMBER() OVER ( PARTITION BY "T"."DS_NO","T"."C_NBR"
>> ORDER BY INTERNAL_FUNCTION("C"."CPCL_NBR") DESC )<=1)
>> 24 - filter(ROW_NUMBER() OVER ( PARTITION BY "T"."DS_NO","T"."C_NBR"
>> ORDER BY INTERNAL_FUNCTION("C"."CPCL_NBR") DESC )<=1)
>>
>> *19.15 predicate from display_cursor:-*
>> 24 - filter(ROW_NUMBER() OVER ( PARTITION BY "T"."DS_NO","T"."C_NBR"
>> ORDER BY INTERNAL_FUNCTION("C"."CPCL_NBR") DESC )<=1)
>> 27 - filter(ROW_NUMBER() OVER ( PARTITION BY "T"."DS_NO","T"."C_NBR"
>> ORDER BY INTERNAL_FUNCTION("C"."CPCL_NBR") DESC )<=1)
>>
>>
>> *Analytical function section from query:-*ROW_NUMBER () OVER (PARTITION
>> BY T.DS_NO, T.C_NBR ORDER BY C.CPCL_NBR DESC) AS CP_RANK,
>> COUNT (*) OVER ( PARTITION BY T.DS_NO, T.C_NBR,C.CPCL_NBR) AS
>> CL_MATCHES,
>> COUNT (*) OVER (PARTITION BY T.DS_NO, T.C_NBR) AS NO_MATCHES
>>
>>
>> On Fri, Nov 18, 2022 at 3:52 AM Jonathan Lewis <jlewisoracle_at_gmail.com>
>> wrote:
>>
>>>
>>> I thought I'd take a look on MOS for bugs that report performance
>>> problems related to "pushed rank" and found one.
>>> Bug 30786641 : QUERY PERFORMANCE REGRESSION AFTER UPGRADING FROM
>>> 11.2.0.4 TO 19.5
>>> It might be related; but maybe not, and you may have the patch built
>>> into your version anyway - the patch is in19.15.2.0
>>>
>>> Regards
>>> Jonathan Lewis
>>>
>>>
>>>
>>> On Thu, 17 Nov 2022 at 20:30, Jonathan Lewis <jlewisoracle_at_gmail.com>
>>> wrote:
>>>
>>>>
>>>> I've now examined the three different Monitor report with a little care.
>>>> THere's more information I would like to collect if I were on site and
>>>> able to re-run the queries, but I think I may have identified the critical
>>>> issue.
>>>>
>>>> I thought you kept saying the problem was in the outer hash join -
>>>> maybe you didn't and I just picked that up from the bits and pieces.
>>>>
>>>> The massive I/O overhead (temp space usage) is in the WINDOW CHILD
>>>> PUSHED RANK operation (the one above the outer join) -It's a problem
>>>> whether or not you use the profile from 11g. Critical observation: the
>>>> input to the window sort is 7 billion rows in 11g, the sort space required
>>>> is 10GB, with 19GB of reads and writes (so a multipass sort operation).
>>>> How do you sort 7 B rows with only 11GB of space ? Answer: you can't.
>>>>
>>>> In 19c the number of rows is 7B in one case and 10B in the other, and
>>>> the space requirement (before failing) is a much more realistic 1TB. The
>>>> inference we draw from this is that the PUSHED RANK is working in 11g, but
>>>> not working correctly in 19c, so the entire data set has to be captured
>>>> before the SORT in 19c but 11g can keep saying "I've got one of those
>>>> already, I only need to keep one" as each row arrives. The output that
>>>> succeeds in roughly 1 row in 100, so it's not too surprising that the temp
>>>> space used if pushed rank doesn't work is roughly 100 times the size.)
>>>>
>>>> The next really important detail to check is the actual runtime
>>>> predicate used in the pushed rank, just in case it's not what is claimed.
>>>> Another really important check is that 11g has been getting the right
>>>> results - maybe the mechanism it has been using was disabled in 19c because
>>>> it could produce the wrong results.
>>>>
>>>> Regards
>>>> Jonathan Lewis
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, 16 Nov 2022 at 18:44, yudhi s <learnerdatabase99_at_gmail.com>
>>>> wrote:
>>>>
>>>>> Yes I tried that and it does increase the estimates for the cte from
>>>>> 38k to 860k but that problematic section of the plan remains same consuming
>>>>> terabytes of temp space.
>>>>>
>>>>> On Thu, 17 Nov, 2022, 12:10 am Lok P, <loknath.73_at_gmail.com> wrote:
>>>>>
>>>>>> Did you try forcing the opt_estimate hint to influence the cte
>>>>>> estimation to increase 18 times as Jonathan suggested? Does that change
>>>>>> anything in the execution path?
>>>>>>
>>>>>> On Wed, 16 Nov, 2022, 11:24 pm yudhi s, <learnerdatabase99_at_gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Extremely sorry, as it looks like I have messed it up. Trying to
>>>>>>> correct it.
>>>>>>>
>>>>>>> Basically we had a sql profile attached in past while we were in
>>>>>>> version 11.2.0.4 and was working fine. also when we moved to 19.11 too that
>>>>>>> same profile was there making the query finish without any issue. but that
>>>>>>> same plan is now making the query to go for consuming terabytes of temp on
>>>>>>> version 19.15. I dont have a sql monitor from 19.11 plan which was
>>>>>>> succeeding fine, but its the same plan phv as the 19.15 plan which i posted
>>>>>>> below in the 2nd sql monitor.
>>>>>>>
>>>>>>> However, the point i want to make here is , all of these plans are
>>>>>>> almost same with minor difference. And If you see the specific section of
>>>>>>> the plan which does the "HASH JOIN OUTER" and passing the resulting output
>>>>>>> to the analytical function is exactly same in all these cases but still the
>>>>>>> 19.15 version is going for ~1TB+ temp. So its definitely odd.
>>>>>>>
>>>>>>> Below is the three comparison of that exact section of the plan
>>>>>>> which is playing key role here in consuming all tempspace. In the first
>>>>>>> case even if the ~7Billion rows in getting sorted using just ~10GB of
>>>>>>> tempspace, whereas in second and third cases its using in TBs. Though the
>>>>>>> data volume is higher but this excessive use of tempspace is not justified.
>>>>>>> Definitely seems buggy.
>>>>>>>
>>>>>>>
>>>>>>> https://gist.github.com/databasetech0073/55c5722da8105dfcc647ab3bd1b24a48
>>>>>>>
>>>>>>> And also as you have highlighted the bug related to bloom filter i
>>>>>>> tried forcing the (no_px_join_filter hints and even setting up
>>>>>>> "_bloom_filter_enabled" and "_bloom_pruning_enabled" in session level to
>>>>>>> stop the bloom pruning and tried running SELECT part of the query, but
>>>>>>> still its using same excessive amount of tempspace.
>>>>>>>
>>>>>>>
>>>>>>> --1st sqlmonitor
>>>>>>>
>>>>>>> Below is the sql monitor and outline in version 11.2.0.4, which was
>>>>>>> finishing by consuming <20GB tempspace.
>>>>>>>
>>>>>>>
>>>>>>> https://gist.github.com/databasetech0073/630dad9fc43d6ab6c8863251e29f1cf2
>>>>>>>
>>>>>>> --2nd sql monitor
>>>>>>>
>>>>>>> Below is the sql monitor and outline in version 19.15 (by forcing
>>>>>>> exactly the same plan through a sql profile as it used to take in 11.2).
>>>>>>>
>>>>>>>
>>>>>>> https://gist.github.com/databasetech0073/37ec33e73a23bd0055148ba43a0a49ca
>>>>>>>
>>>>>>> --3rd sql monitor
>>>>>>>
>>>>>>> Below is the sql monitor and outline in current version ( i.e 19.15
>>>>>>> without any profile) which is also consuming high tempspace.
>>>>>>>
>>>>>>>
>>>>>>> https://gist.github.com/databasetech0073/4b895f5790a993ed326d389b54295f9b
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 15 Nov, 2022, 6:39 pm Jonathan Lewis, <
>>>>>>> jlewisoracle_at_gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Since I've noticed Bloom filters as an anomaly, it might be worth
>>>>>>>> checking if you're patched for:
>>>>>>>> "Bug 30618406 - Huge PGA memory consumption by bloom filters
>>>>>>>> (qesblfilter_seg) in px slaves in full partition-wise join"
>>>>>>>>
>>>>>>>> I know this plan isn't using full PW joins - but (read the details)
>>>>>>>> - your parallel operation uses a lot of memory, and then a lot of temp
>>>>>>>> space. Sometimes the bug goes a lot further than the initial description.
>>>>>>>>
>>>>>>>> Marked as fixed in 23.1, but I can see patches for 19.16 in the RU.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Jonathan Lewis
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 15 Nov 2022 at 07:15, Jonathan Lewis <
>>>>>>>> jlewisoracle_at_gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I've taken a few minutes to look through the github material. The
>>>>>>>>> headings don't agree with the contents. Here's a list showing each heading
>>>>>>>>> plus two lines from the Outline Data under the heading:
>>>>>>>>>
>>>>>>>>> Display cursor plan from version 11.2.0.4
>>>>>>>>> OPTIMIZER_FEATURES_ENABLE('19.1.0')
>>>>>>>>> DB_VERSION('19.1.0')
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Execution which is failing in 19.15 version with high tempspace
>>>>>>>>> consumption
>>>>>>>>> OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
>>>>>>>>> DB_VERSION('19.1.0')
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Execution which was running fine in 19.11 version
>>>>>>>>> OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
>>>>>>>>> DB_VERSION('11.2.0.4')
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> There doesn't seem to be any point in spending time looking at
>>>>>>>>> detail when I've got no confidence that the bits of information supplied
>>>>>>>>> belong together.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> A couple of high visibility points, though:
>>>>>>>>> a) the version labelled "Execution which was running fine in 19.11
>>>>>>>>> version" reported OPT_PARAM('optimizer_dynamic_sampling' 3) in its outline
>>>>>>>>> data, and that could have helped Oracle get better estimates on some of the
>>>>>>>>> predicates that included functions of columns.
>>>>>>>>> b) the version labelled "Display cursor plan from version
>>>>>>>>> 11.2.0.4" included Bloom Filtering sent to storage on the massive tablescan
>>>>>>>>> where I pointed out the massive CPU anomaly in an earlier post.
>>>>>>>>> 76 - storage((COALESCE("FPT"."DEL_FLG",'N')='N' AND
>>>>>>>>> TRUNC(INTERNAL_FUNCTION("FPT"."H_DT_TIM"))=:B1 AND
>>>>>>>>> SYS_OP_BLOOM_FILTER(:BF0000,"FPT"."A_FK")))
>>>>>>>>> filter((COALESCE("FPT"."DEL_FLG",'N')='N' AND
>>>>>>>>> TRUNC(INTERNAL_FUNCTION("FPT"."H_DT_TIM"))=:B1 AND
>>>>>>>>> SYS_OP_BLOOM_FILTER(:BF0000,"FPT"."A_FK")))
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Jonathan Lewis
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, 14 Nov 2022 at 17:18, yudhi s <learnerdatabase99_at_gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thank You Sayan and Jonathan.
>>>>>>>>>>
>>>>>>>>>> As you suggested, as a quick thing , in the select query, I was
>>>>>>>>>> trying this hint in the main block to influence optimizer to increase the
>>>>>>>>>> resulting rows ~18 times from the CTE "C" , but its not getting reflected
>>>>>>>>>> in the plan, which means optimizer is not considering this, but then it
>>>>>>>>>> should appear in the hint report section of the plan as error or unused
>>>>>>>>>> etc.. but it is not getting published there too, so not sure why is this
>>>>>>>>>> happening. Is there anything wrong in the hint below which i am using?
>>>>>>>>>>
>>>>>>>>>> I am using below hint in the main select query which performs
>>>>>>>>>> outer join with the "cte" Table C
>>>>>>>>>>
>>>>>>>>>> /*+OPT_ESTIMATE( TABLE, C, scale_rows=18)*/
>>>>>>>>>>
>>>>>>>>>> Also , I tried as below but seeing no change to the tempspace
>>>>>>>>>> consumption though.
>>>>>>>>>>
>>>>>>>>>> cast(RPAD (SUBSTR (CP.CP_NBR, 1, CP.CPCL_NBR - X.RN),CASE
>>>>>>>>>> CP.PANL_CNT WHEN 0 THEN 16 ELSE CP.PANL_CNT END, '9') as
>>>>>>>>>> varchar2(22)) AS HIGH_VALUE
>>>>>>>>>>
>>>>>>>>>> cast(RPAD (SUBSTR (CP.CP_NBR, 1, CP.CPCL_NBR - X.RN), CASE
>>>>>>>>>> CP.PANL_CNT WHEN 0 THEN 16 ELSE CP.PANL_CNT END, '0') as varchar2(22)
>>>>>>>>>> ) AS LOW_VALUE,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, 14 Nov, 2022, 4:22 pm Jonathan Lewis, <
>>>>>>>>>> jlewisoracle_at_gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Still avoiding spending a lot of time on the github text.
>>>>>>>>>>>
>>>>>>>>>>> a) You materialize a CTE that invovles a connect by query that
>>>>>>>>>>> generates 18 rows - use the opt_estimate() hint to tell Oracle that the
>>>>>>>>>>> materialized view holds 18 times the rows it expects. (If you examine the
>>>>>>>>>>> plan the optimizer allows for ONE row coming from dual, not 18).
>>>>>>>>>>>
>>>>>>>>>>> b) You have several places where you use expresssions like
>>>>>>>>>>> substr(c_nbr,1,4) in predicates . Create (optionally invisible) virtual
>>>>>>>>>>> columns for these expressions and gather stats on them, this will give the
>>>>>>>>>>> optimizer a better chance of getting the right arithmetic, especially for
>>>>>>>>>>> cases where the expression needs a frequency (or top-frequency) histogram.
>>>>>>>>>>> Invisible is safest for existing production code, but you may need to test
>>>>>>>>>>> carefully to see if the optimizer manages to use the stats from the virtual
>>>>>>>>>>> columns when it sees the equivalent expressions.
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Jonathan Lewis
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sun, 13 Nov 2022 at 14:28, yudhi s <
>>>>>>>>>>> learnerdatabase99_at_gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thank you Lok. Actually the join is not exactly on the columns
>>>>>>>>>>>> but with some functions so that might be the case why it seems like
>>>>>>>>>>>> Cartesian.
>>>>>>>>>>>>
>>>>>>>>>>>> However the point is , this query has not been changed and I
>>>>>>>>>>>> agree that it may be because the data has been increased a lil in one of
>>>>>>>>>>>> the table 'C' or 'T' , so the join output has been increased from 7billion
>>>>>>>>>>>> to 10billion but how come that justifies the temp space requirement from
>>>>>>>>>>>> 20gb to 2.5terabyte it not twice or thrice but multiple times.
>>>>>>>>>>>>
>>>>>>>>>>>> As this query is running in parallel-16 degree because of a
>>>>>>>>>>>> underlying table degree , and considering no other option at hand, so I was
>>>>>>>>>>>> thinking if by increasing or decreasing the parallelism with help of hints
>>>>>>>>>>>> will help in completing the query at least with lesser temp space(we have
>>>>>>>>>>>> 2.5TB of temp space with us right now currently) ? Or any other possible
>>>>>>>>>>>> option to make this query succeed?
>>>>>>>>>>>>
>>>>>>>>>>>> Just to note , we had initially 200gb of temp space but just
>>>>>>>>>>>> with the hope that this query may finish, we already increased it to 2.5TB
>>>>>>>>>>>> now, but still no luck and the query is failing and it's happening after
>>>>>>>>>>>> this 19.15 patch.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, 13 Nov, 2022, 6:06 pm Lok P, <loknath.73_at_gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The below section you posted shows the input to 'Hash join
>>>>>>>>>>>>> outer' were ~1Million and 37milion respectively, but the result came from
>>>>>>>>>>>>> the outer join is ~7billion. So is there any oddity in the data in
>>>>>>>>>>>>> table 'T' and 'C' which is playing a role which is increasing the rows
>>>>>>>>>>>>> input to your analytical function recently?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> SQL Plan Monitoring Details (Plan Hash Value=2056239125)
>>>>>>>>>>>>>
>>>>>>>>>>>>> =================================================================================================================================================================================================================================================================================
>>>>>>>>>>>>> | Id | Operation
>>>>>>>>>>>>> | Name | Rows | Cost | Time |
>>>>>>>>>>>>> Start | Execs | Rows | Read | Read | Write | Write | Cell | Mem
>>>>>>>>>>>>> | Temp | Activity | Activity Detail |
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> | | (Estim) | | Active(s) |
>>>>>>>>>>>>> Active | | (Actual) | Reqs | Bytes | Reqs | Bytes | Offload | (Max)
>>>>>>>>>>>>> | (Max) | (%) | (# samples) |
>>>>>>>>>>>>>
>>>>>>>>>>>>> =================================================================================================================================================================================================================================================================================
>>>>>>>>>>>>> |
>>>>>>>>>>>>> | | | | |
>>>>>>>>>>>>> | | | | | | | | |
>>>>>>>>>>>>> | | direct path read temp (553) |
>>>>>>>>>>>>> | |
>>>>>>>>>>>>> | | | | |
>>>>>>>>>>>>> | | | | | | | | |
>>>>>>>>>>>>> | | direct path write temp (47) |
>>>>>>>>>>>>> | 31 | HASH JOIN RIGHT OUTER
>>>>>>>>>>>>> | | 9M | 2M | 1676 |
>>>>>>>>>>>>> +218 | 16 | 7G | | | | | | 217M |
>>>>>>>>>>>>> | 23.08 | Cpu (4041) |
>>>>>>>>>>>>> | 32 | BUFFER SORT
>>>>>>>>>>>>> | | | | 4 |
>>>>>>>>>>>>> +216 | 16 | 1M | | | | | | 97M
>>>>>>>>>>>>> | | | |
>>>>>>>>>>>>> | 33 | PX RECEIVE
>>>>>>>>>>>>> | | 10647 | 24 | 4 |
>>>>>>>>>>>>> +216 | 16 | 1M | | | | | |
>>>>>>>>>>>>> | | | |
>>>>>>>>>>>>> | 34 | PX SEND HASH
>>>>>>>>>>>>> | :TQ10005 | 10647 | 24 | 1633 |
>>>>>>>>>>>>> +216 | 1 | 1M | | | | | | |
>>>>>>>>>>>>> | 0.01 | Cpu (2) |
>>>>>>>>>>>>> | 35 | VIEW
>>>>>>>>>>>>> | | 10647 | 24 | 1632 |
>>>>>>>>>>>>> +217 | 1 | 1M | | | | | |
>>>>>>>>>>>>> | | | |
>>>>>>>>>>>>> | 36 | TABLE ACCESS STORAGE FULL
>>>>>>>>>>>>> | SYS_TEMP_0FDA1E71E_D71F1BDE | 10647 | 24 | 1632 |
>>>>>>>>>>>>> +217 | 1 | 1M | 106 | 104MB | | | |
>>>>>>>>>>>>> | | 0.01 | cell multiblock physical read (1) |
>>>>>>>>>>>>> | 37 | PX RECEIVE
>>>>>>>>>>>>> | | 9M | 2M | 1676 |
>>>>>>>>>>>>> +218 | 16 | 37M | | | | | | |
>>>>>>>>>>>>> | 0.05 | Cpu (9) |
>>>>>>>>>>>>> | 38 | PX SEND HASH
>>>>>>>>>>>>> | :TQ10009 | 9M | 2M | 1677 |
>>>>>>>>>>>>> +217 | 16 | 37M | | | | | |
>>>>>>>>>>>>> | | 0.13 | Cpu (23) |
>>>>>>>>>>>>> | 39 | VIEW
>>>>>>>>>>>>> | | 9M | 2M | 1677 |
>>>>>>>>>>>>> +217 | 16 | 37M | | | | | | |
>>>>>>>>>>>>> | | |
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Nov 13, 2022 at 2:34 AM yudhi s <
>>>>>>>>>>>>> learnerdatabase99_at_gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you Jonathan.
>>>>>>>>>>>>>> Actually the original query was a long one so i was trying to
>>>>>>>>>>>>>> reduce the complexity by just posting the particular section of the query
>>>>>>>>>>>>>> which was causing the temp space spike. But I agree that just a partial
>>>>>>>>>>>>>> query doesn't make much sense. Also my apology as the plan format was
>>>>>>>>>>>>>> distorted.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have again posted the exact query below with the actual
>>>>>>>>>>>>>> object names being replaced with dummy names. I have posted the plan in the
>>>>>>>>>>>>>> github in the link below, so the sql execution plan format will be intact.
>>>>>>>>>>>>>> And I am not having the "display cursor" plan for the current runs/failures
>>>>>>>>>>>>>> for this INSERT query, so i have posted the sql monitors for them and i had
>>>>>>>>>>>>>> the display cursor plan when it was running fine in 11.2.0.4 so i posted
>>>>>>>>>>>>>> that just for reference. Also I have copied and pasted the outline section
>>>>>>>>>>>>>> from the display_awr below each of the sqlmonitor plan.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://gist.github.com/databasetech0073/714263bce477f624763f757e457cb861
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As mentioned earlier, the section of the plan which is doing
>>>>>>>>>>>>>> the outer join is below and it results in ~10billion rows. Which then is
>>>>>>>>>>>>>> passed to the analytical function. Was trying to understand, Is there any
>>>>>>>>>>>>>> possible way to minimize the temp space consumption in this scenario?
>>>>>>>>>>>>>> because we kept on increasing the temp space assuming it would succeed
>>>>>>>>>>>>>> somehow, but it just kept consuming 2.5TB+ space and then also it failed
>>>>>>>>>>>>>> with "Ora-01652 unable to extend temp segment .." as it saturates all
>>>>>>>>>>>>>> allocated temp space?
>>>>>>>>>>>>>> As far as I know the encryption/TDE was already there in this
>>>>>>>>>>>>>> database before we moved from 19.11 to 19.15. I will double check with the
>>>>>>>>>>>>>> infra team on this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "LEFT OUTER JOIN C
>>>>>>>>>>>>>> ON SUBSTR (T.C_NBR, 1, 4) = C.BNK_ID_NB_4
>>>>>>>>>>>>>> AND LENGTH (T.C_NBR) = C.PANL_CNT
>>>>>>>>>>>>>> AND T.C_NBR BETWEEN C.LOW_VALUE AND
>>>>>>>>>>>>>> C.HIGH_VALUE) S"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Nov 12, 2022 at 12:50 PM Jonathan Lewis <
>>>>>>>>>>>>>> jlewisoracle_at_gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >> We reran the query and it took a different plan hash
>>>>>>>>>>>>>>> value , however the path is almost similar wrt the line which does "Outer
>>>>>>>>>>>>>>> join" and the "analytical function" evaluation. I have posted the specific
>>>>>>>>>>>>>>> section of sql monitor below with predicate and column projection for that
>>>>>>>>>>>>>>> run.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The supplied outputs are virtually unreadable, for different
>>>>>>>>>>>>>>> execution plans, for different versions (and youve changed from 19.11 for
>>>>>>>>>>>>>>> the "slow" query to 11.2.0.4!) , and partial. I'm not going to look at them.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> >> However, the sql monitor shows status as 'error' out
>>>>>>>>>>>>>>> after sometime(~30minutes)) but the underlying sessions(~33 parallel
>>>>>>>>>>>>>>> sessions) kept on running for 2hrs+
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> That looks like the behaviour that Mohamed Houri has written
>>>>>>>>>>>>>>> about in the past:
>>>>>>>>>>>>>>> https://hourim.wordpress.com/2020/03/07/sql-monitor-flaw/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd forgotten it would be harder to collect the correct
>>>>>>>>>>>>>>> statistics from all the processes when running a parallel query - finding
>>>>>>>>>>>>>>> sessions by SQL_ID could work, but using views like v$px_sesstat etc. is
>>>>>>>>>>>>>>> safer. Nothing stands out from the information you supplied about wait
>>>>>>>>>>>>>>> events, and the only "unusual" thing in the session activity is that you
>>>>>>>>>>>>>>> are writing encrypted blocks to the temporary tablespace - were you doing
>>>>>>>>>>>>>>> that in the earlier version? It shouldn't explain a massive increase in
>>>>>>>>>>>>>>> space requirements, but it might be relevant - it would help to explain an
>>>>>>>>>>>>>>> increase in CPU.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> Jonathan Lewis
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, 11 Nov 2022 at 17:07, yudhi s <
>>>>>>>>>>>>>>> learnerdatabase99_at_gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thank You Sayan, Jonathan.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We reran the query and it took a different plan hash value
>>>>>>>>>>>>>>>> , however the path is almost similar wrt the line which does "Outer join"
>>>>>>>>>>>>>>>> and the "analytical function" evaluation. I have posted the specific
>>>>>>>>>>>>>>>> section of sql monitor below with predicate and column projection for that
>>>>>>>>>>>>>>>> run. However, the sql monitor shows status as 'error' out after
>>>>>>>>>>>>>>>> sometime(~30minutes)) but the underlying sessions(~33 parallel sessions)
>>>>>>>>>>>>>>>> kept on running for 2hrs+, until then I saw ~2TB of temp space already
>>>>>>>>>>>>>>>> consumed and later it failed with insufficient temp space error.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
-- http://www.freelists.org/webpage/oracle-lReceived on Fri Nov 18 2022 - 10:38:56 CET