Re: high temp space usage for same analytic function on latest version

From: Chris Taylor <christopherdtaylor1994_at_gmail.com>
Date: Thu, 17 Nov 2022 14:53:45 -0500
Message-ID: <CAP79kiQLWSTJk3wrSNP5=sJQyy+RdmzVFM5XaLp--7Q1wW1q9Q_at_mail.gmail.com>



How often does this part really happen I wonder though?

  • If your data is well clustered (rows per block) with respect to predicates, this can be an amazing reduction in the blocks handled in the compute engine at all.* I can't imagine it's very often relative to the amount of predicates that can be asked for for any set of data - right? (Or am I just cynical because of healthcare data and whatnot?)

Chris

On Thu, Nov 17, 2022 at 2:49 PM Mark W. Farnham <mwf_at_rsiz.com> wrote:

> storage indexes are like Bloom filters on steroids. If unneeded data is
> tossed into anything to be filtered by predicates, then depending on when
> it is filtered it can dramatically increase the “n” in a lot of n log n
> operations, and if a correctly working storage index prunes the data blocks
> returned dramatically, then a lot of data doesn’t need any temp space. If
> your data is well clustered (rows per block) with respect to predicates,
> this can be an amazing reduction in the blocks handled in the compute
> engine at all.
>
>
>
> It probably has nothing to do with already filtered data being operated on
> by an analytical function.
>
>
>
> BTW, which analytical function?
>
>
>
> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *yudhi s
> *Sent:* Thursday, November 17, 2022 12:45 PM
> *To:* Mark W. Farnham
> *Cc:* Lok P; Jonathan Lewis; Oracle L
> *Subject:* Re: high temp space usage for same analytic function on latest
> version
>
>
>
> Thank you Mark. Actually the issue we are having is suddenly post
> migration to oracle 19.15 from 19.11 the temp space consumption for
> evaluating an analytical function has been increased( it's going beyond ~3
> terabyte+) for similar volume of input data and it's making the query to
> fail with insufficient temp space error. We are trying to see if any
> possible way the temp spill can be minimized. And also I can't think of,
> how a missing storage index would play a role here.
>
>
>
> On Thu, 17 Nov, 2022, 5:55 pm Mark W. Farnham, <mwf_at_rsiz.com> wrote:
>
> Without looking at much of the thread, is it possible that you have
> storage indexes that have become invalid?
>
>
>
> Other than a side effect of delivering pant loads of data to the compute
> engine for filtering there when they might being valid have slenderized
> data retrieval significantly, my recollection is that no complaint will be
> in the traces. This could be completely off target and may not be related
> to your case at all, but it should not take much effort to look.
>
>
>
> *From:* oracle-l-bounce_at_freelists.org [mailto:
> oracle-l-bounce_at_freelists.org] *On Behalf Of *yudhi s
> *Sent:* Wednesday, November 16, 2022 1:44 PM
> *To:* Lok P
> *Cc:* Jonathan Lewis; Oracle L
> *Subject:* Re: high temp space usage for same analytic function on latest
> version
>
>
>
> Yes I tried that and it does increase the estimates for the cte from 38k
> to 860k but that problematic section of the plan remains same consuming
> terabytes of temp space.
>
>
>
> On Thu, 17 Nov, 2022, 12:10 am Lok P, <loknath.73_at_gmail.com> wrote:
>
> Did you try forcing the opt_estimate hint to influence the cte estimation
> to increase 18 times as Jonathan suggested? Does that change anything in
> the execution path?
>
>
>
> On Wed, 16 Nov, 2022, 11:24 pm yudhi s, <learnerdatabase99_at_gmail.com>
> wrote:
>
> Extremely sorry, as it looks like I have messed it up. Trying to correct
> it.
>
>
>
> Basically we had a sql profile attached in past while we were in version
> 11.2.0.4 and was working fine. also when we moved to 19.11 too that same
> profile was there making the query finish without any issue. but that same
> plan is now making the query to go for consuming terabytes of temp on
> version 19.15. I dont have a sql monitor from 19.11 plan which was
> succeeding fine, but its the same plan phv as the 19.15 plan which i posted
> below in the 2nd sql monitor.
>
>
>
> However, the point i want to make here is , all of these plans are almost
> same with minor difference. And If you see the specific section of the plan
> which does the "HASH JOIN OUTER" and passing the resulting output to the
> analytical function is exactly same in all these cases but still the 19.15
> version is going for ~1TB+ temp. So its definitely odd.
>
>
>
> Below is the three comparison of that exact section of the plan which is
> playing key role here in consuming all tempspace. In the first case even if
> the ~7Billion rows in getting sorted using just ~10GB of tempspace, whereas
> in second and third cases its using in TBs. Though the data volume is
> higher but this excessive use of tempspace is not justified. Definitely
> seems buggy.
>
>
>
> https://gist.github.com/databasetech0073/55c5722da8105dfcc647ab3bd1b24a48
>
>
>
> And also as you have highlighted the bug related to bloom filter i tried
> forcing the (no_px_join_filter hints and even setting up
> "_bloom_filter_enabled" and "_bloom_pruning_enabled" in session level to
> stop the bloom pruning and tried running SELECT part of the query, but
> still its using same excessive amount of tempspace.
>
>
>
>
>
> --1st sqlmonitor
>
>
>
> Below is the sql monitor and outline in version 11.2.0.4, which was
> finishing by consuming <20GB tempspace.
>
>
>
> https://gist.github.com/databasetech0073/630dad9fc43d6ab6c8863251e29f1cf2
>
>
>
> --2nd sql monitor
>
>
>
> Below is the sql monitor and outline in version 19.15 (by forcing exactly
> the same plan through a sql profile as it used to take in 11.2).
>
>
>
> https://gist.github.com/databasetech0073/37ec33e73a23bd0055148ba43a0a49ca
>
>
>
> --3rd sql monitor
>
>
>
> Below is the sql monitor and outline in current version ( i.e 19.15
> without any profile) which is also consuming high tempspace.
>
>
>
> https://gist.github.com/databasetech0073/4b895f5790a993ed326d389b54295f9b
>
>
>
>
>
>
>
> On Tue, 15 Nov, 2022, 6:39 pm Jonathan Lewis, <jlewisoracle_at_gmail.com>
> wrote:
>
>
>
> Since I've noticed Bloom filters as an anomaly, it might be worth checking
> if you're patched for:
>
> "Bug 30618406 - Huge PGA memory consumption by bloom filters
> (qesblfilter_seg) in px slaves in full partition-wise join"
>
>
>
> I know this plan isn't using full PW joins - but (read the details) - your
> parallel operation uses a lot of memory, and then a lot of temp space.
> Sometimes the bug goes a lot further than the initial description.
>
>
>
> Marked as fixed in 23.1, but I can see patches for 19.16 in the RU.
>
>
>
> Regards
>
> Jonathan Lewis
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Tue, 15 Nov 2022 at 07:15, Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>
>
> I've taken a few minutes to look through the github material. The headings
> don't agree with the contents. Here's a list showing each heading plus two
> lines from the Outline Data under the heading:
>
>
>
> Display cursor plan from version 11.2.0.4
>
> OPTIMIZER_FEATURES_ENABLE('19.1.0')
> DB_VERSION('19.1.0')
>
>
>
>
>
> Execution which is failing in 19.15 version with high tempspace consumption
>
> OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
> DB_VERSION('19.1.0')
>
>
>
>
>
> Execution which was running fine in 19.11 version
>
> OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
> DB_VERSION('11.2.0.4')
>
>
>
>
>
> There doesn't seem to be any point in spending time looking at detail when
> I've got no confidence that the bits of information supplied belong
> together.
>
>
>
>
>
> A couple of high visibility points, though:
>
> a) the version labelled "Execution which was running fine in 19.11
> version" reported OPT_PARAM('optimizer_dynamic_sampling' 3) in its outline
> data, and that could have helped Oracle get better estimates on some of the
> predicates that included functions of columns.
>
> b) the version labelled "Display cursor plan from version 11.2.0.4"
> included Bloom Filtering sent to storage on the massive tablescan where I
> pointed out the massive CPU anomaly in an earlier post.
>
> 76 - storage((COALESCE("FPT"."DEL_FLG",'N')='N' AND
> TRUNC(INTERNAL_FUNCTION("FPT"."H_DT_TIM"))=:B1 AND
> SYS_OP_BLOOM_FILTER(:BF0000,"FPT"."A_FK")))
> filter((COALESCE("FPT"."DEL_FLG",'N')='N' AND
> TRUNC(INTERNAL_FUNCTION("FPT"."H_DT_TIM"))=:B1 AND
> SYS_OP_BLOOM_FILTER(:BF0000,"FPT"."A_FK")))
>
>
>
>
>
> Regards
>
> Jonathan Lewis
>
>
>
>
>
>
>
> On Mon, 14 Nov 2022 at 17:18, yudhi s <learnerdatabase99_at_gmail.com> wrote:
>
> Thank You Sayan and Jonathan.
>
> As you suggested, as a quick thing , in the select query, I was trying
> this hint in the main block to influence optimizer to increase the
> resulting rows ~18 times from the CTE "C" , but its not getting reflected
> in the plan, which means optimizer is not considering this, but then it
> should appear in the hint report section of the plan as error or unused
> etc.. but it is not getting published there too, so not sure why is this
> happening. Is there anything wrong in the hint below which i am using?
>
> I am using below hint in the main select query which performs outer join
> with the "cte" Table C
>
> /*+OPT_ESTIMATE( TABLE, C, scale_rows=18)*/
>
> Also , I tried as below but seeing no change to the tempspace consumption
> though.
>
> cast(RPAD (SUBSTR (CP.CP_NBR, 1, CP.CPCL_NBR - X.RN),CASE
> CP.PANL_CNT WHEN 0 THEN 16 ELSE CP.PANL_CNT END, '9') as
> varchar2(22)) AS HIGH_VALUE
>
> cast(RPAD (SUBSTR (CP.CP_NBR, 1, CP.CPCL_NBR - X.RN), CASE
> CP.PANL_CNT WHEN 0 THEN 16 ELSE CP.PANL_CNT END, '0') as varchar2(22)
> ) AS LOW_VALUE,
>
>
>
>
>
> On Mon, 14 Nov, 2022, 4:22 pm Jonathan Lewis, <jlewisoracle_at_gmail.com>
> wrote:
>
> Still avoiding spending a lot of time on the github text.
>
>
>
> a) You materialize a CTE that invovles a connect by query that generates
> 18 rows - use the opt_estimate() hint to tell Oracle that the materialized
> view holds 18 times the rows it expects. (If you examine the plan the
> optimizer allows for ONE row coming from dual, not 18).
>
>
>
> b) You have several places where you use expresssions like
> substr(c_nbr,1,4) in predicates . Create (optionally invisible) virtual
> columns for these expressions and gather stats on them, this will give the
> optimizer a better chance of getting the right arithmetic, especially for
> cases where the expression needs a frequency (or top-frequency) histogram.
> Invisible is safest for existing production code, but you may need to test
> carefully to see if the optimizer manages to use the stats from the virtual
> columns when it sees the equivalent expressions.
>
>
>
> Regards
>
> Jonathan Lewis
>
>
>
>
>
>
>
>
>
> On Sun, 13 Nov 2022 at 14:28, yudhi s <learnerdatabase99_at_gmail.com> wrote:
>
> Thank you Lok. Actually the join is not exactly on the columns but with
> some functions so that might be the case why it seems like Cartesian.
>
>
>
> However the point is , this query has not been changed and I agree that it
> may be because the data has been increased a lil in one of the table 'C' or
> 'T' , so the join output has been increased from 7billion to 10billion but
> how come that justifies the temp space requirement from 20gb to 2.5terabyte
> it not twice or thrice but multiple times.
>
>
>
> As this query is running in parallel-16 degree because of a underlying
> table degree , and considering no other option at hand, so I was thinking
> if by increasing or decreasing the parallelism with help of hints will help
> in completing the query at least with lesser temp space(we have 2.5TB of
> temp space with us right now currently) ? Or any other possible option to
> make this query succeed?
>
>
>
> Just to note , we had initially 200gb of temp space but just with the hope
> that this query may finish, we already increased it to 2.5TB now, but still
> no luck and the query is failing and it's happening after this 19.15 patch.
>
>
>
> On Sun, 13 Nov, 2022, 6:06 pm Lok P, <loknath.73_at_gmail.com> wrote:
>
> The below section you posted shows the input to 'Hash join outer' were
> ~1Million and 37milion respectively, but the result came from the outer
> join is ~7billion. So is there any oddity in the data in table 'T' and 'C'
> which is playing a role which is increasing the rows input to your
> analytical function recently?
>
>
>
>
>
> SQL Plan Monitoring Details (Plan Hash Value=2056239125)
>
> =================================================================================================================================================================================================================================================================================
> | Id | Operation
> | Name | Rows | Cost | Time | Start |
> Execs | Rows | Read | Read | Write | Write | Cell | Mem | Temp
> | Activity | Activity Detail |
> | |
> | | (Estim) | | Active(s) | Active |
> | (Actual) | Reqs | Bytes | Reqs | Bytes | Offload | (Max) | (Max) |
> (%) | (# samples) |
>
> =================================================================================================================================================================================================================================================================================
> |
> | | | | | |
> | | | | | | | | |
> | direct path read temp (553) |
> | |
> | | | | | |
> | | | | | | | | |
> | direct path write temp (47) |
> | 31 | HASH JOIN RIGHT OUTER
> | | 9M | 2M | 1676 | +218 |
> 16 | 7G | | | | | | 217M | |
> 23.08 | Cpu (4041) |
> | 32 | BUFFER SORT
> | | | | 4 | +216 |
> 16 | 1M | | | | | | 97M | |
> | |
> | 33 | PX RECEIVE
> | | 10647 | 24 | 4 | +216 |
> 16 | 1M | | | | | | | |
> | |
> | 34 | PX SEND HASH
> | :TQ10005 | 10647 | 24 | 1633 | +216 |
> 1 | 1M | | | | | | | |
> 0.01 | Cpu (2) |
> | 35 | VIEW
> | | 10647 | 24 | 1632 | +217 |
> 1 | 1M | | | | | | | |
> | |
> | 36 | TABLE ACCESS STORAGE FULL
> | SYS_TEMP_0FDA1E71E_D71F1BDE | 10647 | 24 | 1632 | +217 |
> 1 | 1M | 106 | 104MB | | | | | |
> 0.01 | cell multiblock physical read (1) |
> | 37 | PX RECEIVE
> | | 9M | 2M | 1676 | +218 |
> 16 | 37M | | | | | | | |
> 0.05 | Cpu (9) |
> | 38 | PX SEND HASH
> | :TQ10009 | 9M | 2M | 1677 | +217 |
> 16 | 37M | | | | | | | |
> 0.13 | Cpu (23) |
> | 39 | VIEW
> | | 9M | 2M | 1677 | +217 |
> 16 | 37M | | | | | | | |
> | |
>
>
>
> On Sun, Nov 13, 2022 at 2:34 AM yudhi s <learnerdatabase99_at_gmail.com>
> wrote:
>
> Thank you Jonathan.
>
> Actually the original query was a long one so i was trying to reduce the
> complexity by just posting the particular section of the query which was
> causing the temp space spike. But I agree that just a partial query doesn't
> make much sense. Also my apology as the plan format was distorted.
>
>
>
> I have again posted the exact query below with the actual object names
> being replaced with dummy names. I have posted the plan in the github in
> the link below, so the sql execution plan format will be intact. And I am
> not having the "display cursor" plan for the current runs/failures for this
> INSERT query, so i have posted the sql monitors for them and i had the
> display cursor plan when it was running fine in 11.2.0.4 so i posted that
> just for reference. Also I have copied and pasted the outline section from
> the display_awr below each of the sqlmonitor plan.
>
>
>
> https://gist.github.com/databasetech0073/714263bce477f624763f757e457cb861
>
> As mentioned earlier, the section of the plan which is doing the outer
> join is below and it results in ~10billion rows. Which then is passed to
> the analytical function. Was trying to understand, Is there any possible
> way to minimize the temp space consumption in this scenario? because we
> kept on increasing the temp space assuming it would succeed somehow, but
> it just kept consuming 2.5TB+ space and then also it failed with "Ora-01652
> unable to extend temp segment .." as it saturates all allocated temp
> space?
>
> As far as I know the encryption/TDE was already there in this database
> before we moved from 19.11 to 19.15. I will double check with the infra
> team on this.
>
> "LEFT OUTER JOIN C
> ON SUBSTR (T.C_NBR, 1, 4) = C.BNK_ID_NB_4
> AND LENGTH (T.C_NBR) = C.PANL_CNT
> AND T.C_NBR BETWEEN C.LOW_VALUE AND C.HIGH_VALUE) S"
>
>
>
>
>
> On Sat, Nov 12, 2022 at 12:50 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
> >> We reran the query and it took a different plan hash value , however
> the path is almost similar wrt the line which does "Outer join" and the
> "analytical function" evaluation. I have posted the specific section of sql
> monitor below with predicate and column projection for that run.
>
>
>
> The supplied outputs are virtually unreadable, for different execution
> plans, for different versions (and youve changed from 19.11 for the "slow"
> query to 11.2.0.4!) , and partial. I'm not going to look at them.
>
>
>
> >> However, the sql monitor shows status as 'error' out after
> sometime(~30minutes)) but the underlying sessions(~33 parallel sessions)
> kept on running for 2hrs+
>
>
>
> That looks like the behaviour that Mohamed Houri has written about in the
> past: https://hourim.wordpress.com/2020/03/07/sql-monitor-flaw/
>
>
>
> I'd forgotten it would be harder to collect the correct statistics from
> all the processes when running a parallel query - finding sessions by
> SQL_ID could work, but using views like v$px_sesstat etc. is safer.
> Nothing stands out from the information you supplied about wait events, and
> the only "unusual" thing in the session activity is that you are writing
> encrypted blocks to the temporary tablespace - were you doing that in the
> earlier version? It shouldn't explain a massive increase in space
> requirements, but it might be relevant - it would help to explain an
> increase in CPU.
>
>
>
>
>
> Regards
>
> Jonathan Lewis
>
>
>
>
>
> On Fri, 11 Nov 2022 at 17:07, yudhi s <learnerdatabase99_at_gmail.com> wrote:
>
> Thank You Sayan, Jonathan.
>
>
>
> We reran the query and it took a different plan hash value , however the
> path is almost similar wrt the line which does "Outer join" and the
> "analytical function" evaluation. I have posted the specific section of sql
> monitor below with predicate and column projection for that run. However,
> the sql monitor shows status as 'error' out after sometime(~30minutes))
> but the underlying sessions(~33 parallel sessions) kept on running for
> 2hrs+, until then I saw ~2TB of temp space already consumed and later it
> failed with insufficient temp space error.
>
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Nov 17 2022 - 20:53:45 CET

Original text of this message