Re: high temp space usage for same analytic function on latest version

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Fri, 11 Nov 2022 10:07:18 +0000
Message-ID: <CAGtsp8mbx+UFprvhbLVHYi3_mNLPCEXd7vkQbGJjrK=H66KFpg_at_mail.gmail.com>



Like Sayan, I'd want to look at a report of the actual execution plan that included the projection information (in case more columns had been projected in the second case) and in the predicate information (in case some filtering had been applied later in the slow plan that the fast one).

If the problem is easily reproducible I'd also repeat the slow query but capture the session events summary (v$session_event) and the non-zero session activity stats (v$sesstat for the session - with stat names) to see more of where the time went and what type of activity was going on.

The second run shows more data coming through various rowsources, so it's not surprising that various steps of the plan take more time; and if there's any competition for resources (or resource limits being reached) then even a slightly larger volume of data could cause a significant change in performance (e.g. a hash join or sort spilling to disc). Having said that, I think you need to look at operation 79 and ask:

"Why does a "smart table scan" record 2,156 seconds for I/O and 17 seconds of CPU in the fast case, but only 138 seconds of I/O with 9,743 seconds of CPU in the other?"

I think the answer MIGHT be that the cell servers were overloaded and Oracle decided to pass all the data to the database servers rather than applying predicates at storage. (Is there any compression going on, by the way?) This is where the event and activity stats (the latter more likely than the former) could give us the answers.

If it's overload, why would that happen ? Maybe because the fast query was run at 6:00 am and the slow one around midday when lots of other activity was going on.

Looking around operation 30 where the query crashed - the hash join at operation 31 seems to have completed in memory and the row (actual) and max mem are not unreasonable when you compare the two inputs. The oddity is that the hash join reports 4,041 CPU seconds for the FAST query and only 9 CPU seconds for the SLOW query! I think a key question is why the difference in CPU. It may be an oddity of the join predicate, it may be a difference in placement of a function call, it may be an accounting error. That's why the projection and predicate information could be most informative.

I can't see any obvious reason why operation 30 should then fail, since the available memory was large and the sort runs (blocks per write) seem (on average to be larger for the slow run than the fast one). Did you get any error messages from the query, what about trace files?

Regards
Jonathan Lewis

On Thu, 10 Nov 2022 at 20:06, yudhi s <learnerdatabase99_at_gmail.com> wrote:

> Hello Listers, Its oracle database version 19C. Something odd behaviour we
> are seeing , one of the queries which used to run fine suddenly started to
> fail with Ora-01652, (consuming ~1TB+ temp space) after our infra team
> applied a 19.15 patch. Basically it was version 19.11 earlier and was
> working fine.
>
> The sql monitor i have published below is from the actual query when it
> was running successfully VS when it failed. In the sql monitor the line
> number -30 is the one which is making the query fail now. The plan is
> mostly the same in both the cases in that section. In one case it sorted
> ~7billion rows with ~20GB temp space but in the other it consumed ~1TB+
> temp space and failed while it was sorting ~10 billion rows result.
>
> Below is the access predicate which gets evaluated for the line no- 31
> i.e."HASH JOIN RIGHT OUTER". As we have functions used in the predicates
> during the out join evaluation, So wondering , if there are any bugs
> related to how the Outer join gets evaluated in presence of functions in
> the predicate or how the analytical functions works in 19.11 vs 19.15?
>
> Below is the access Predicate for "HASH JOIN RIGHT OUTER" i.e. Plan line
> id - 31:-
> access("C"."PANL_CNT"(+)=LENGTH("T"."C_NBR") AND
> "C"."BI_NBR"(+)=SUBSTR("T"."C_NBR",1,4))
> filter("T"."C_NBR"<="C"."HIGH_VALUE"(+) AND
> "T"."C_NBR">="C"."LOW_VALUE"(+))
>
> ********************* Execution which is failing in 19.15 version with
> high temp space consumption**********************
>
> https://gist.github.com/databasetech0073/05a8ccfb7022eb9dadb508d40286edb3
>
>
>
> ===================================================================================================================================================================================================================================================================================
> | Id | Operation
> | Name | Rows | Cost | Time | Start |
> Execs | Rows | Read | Read | Write | Write | Mem | Temp | Activity
> | Activity Detail |
> | |
> | | (Estim) | | Active(s) | Active |
> | (Actual) | Reqs | Bytes | Reqs | Bytes | (Max) | (Max) | (%) |
> (# samples) |
>
> ===================================================================================================================================================================================================================================================================================
> | 25 | PX SEND HASH
> | :TQ10011 | 9M | 4M | | |
> | | | | | | . | . | |
> |
> | 26 | VIEW
> | | 9M | 4M | | |
> | | | | | | . | . | |
> |
> | 27 | WINDOW SORT PUSHED RANK
> | | 9M | 4M | | |
> | | | | | | . | . | |
> |
> | 28 | PX RECEIVE
> | | 9M | 4M | | |
> | | | | | | . | . | |
> |
> | 29 | PX SEND HASH
> | :TQ10010 | 9M | 4M | | |
> 16 | | | | | | . | . | |
> |
> | 30 | WINDOW CHILD PUSHED RANK
> | | 9M | 4M | 3112 | +915 |
> 16 | 0 | | | 3M | 1TB | 9GB | 1TB | 0.06 |
> Cpu (7) |
> | 31 | HASH JOIN RIGHT OUTER
> | | 9M | 3M | 3112 | +915 |
> 16 | 10G | | | | | 648MB | . | 0.08 |
> Cpu (9) |
> | 32 | BUFFER SORT
> | | | | 8 | +908 |
> 16 | 4M | | | | | 350MB | . | 0.01 |
> Cpu (1) |
>
>
>
> ********************* Execution which was running fine in 19.11 version
> **********************
>
> https://gist.github.com/databasetech0073/d123af1aa4c0c4a0716725103c3c57c9
>
> SQL Plan Monitoring Details (Plan Hash Value=2056239125)
>
> =================================================================================================================================================================================================================================================================================
> | Id | Operation
> | Name | Rows | Cost | Time | Start |
> Execs | Rows | Read | Read | Write | Write | Cell | Mem | Temp
> | Activity | Activity Detail |
> | |
> | | (Estim) | | Active(s) | Active |
> | (Actual) | Reqs | Bytes | Reqs | Bytes | Offload | (Max) | (Max) |
> (%) | (# samples) |
>
> =================================================================================================================================================================================================================================================================================
> | 28 | PX RECEIVE
> | | 9M | 3M | 86 | +1893 |
> 16 | 70M | | | | | | | |
> 0.11 | Cpu (19) |
> | 29 | PX SEND HASH
> | :TQ10010 | 9M | 3M | 77 | +1893 |
> 16 | 70M | | | | | | | |
> 0.25 | Cpu (43) |
> | 30 | WINDOW CHILD PUSHED RANK
> | | 9M | 3M | 1752 | +218 |
> 16 | 70M | 146K | 19GB | 101K | 19GB | | 4G | 10G |
> 41.97 | Cpu (6748) |
> | |
> | | | | | |
> | | | | | | | | |
> | direct path read temp (553) |
> | |
> | | | | | |
> | | | | | | | | |
> | direct path write temp (47) |
> | 31 | HASH JOIN RIGHT OUTER
> | | 9M | 2M | 1676 | +218 |
> 16 | 7G | | | | | | 217M | |
> 23.08 | Cpu (4041) |
> | 32 | BUFFER SORT
> | | | | 4 | +216 |
> 16 | 1M | | | | | | 97M | |
> | |
>
>
>
> The actual query is a big one. But I am just putting here the key section
> of the query which is causing those window functions to be evaluated and
> thus exploding the temp space.
>
> WITH C
> AS (SELECT CP.CPCL_NBR - X.RN AS CPCL_NBR, SUBSTR (CP.CP_NBR, 1, 4)
> AS BI_NBR,
> CASE CP.PANL_CNT
> WHEN 0 THEN 16
> ELSE CP.PANL_CNT
> END AS PANL_CNT,
> RPAD ( SUBSTR (CP.CP_NBR,1,CP.CPCL_NBR - X.RN),
> CASE CP.PANL_CNT
> WHEN 0 THEN 16
> ELSE CP.PANL_CNT
> END, '0') AS LOW_VALUE,
> RPAD ( SUBSTR (CP.CP_NBR, 1,CP.CPCL_NBR - X.RN),
> CASE CP.PANL_CNT
> WHEN 0 THEN 16
> ELSE CP.PANL_CNT
> END, '9') AS HIGH_VALUE
> FROM USER1.CP CP
> INNER JOIN ( SELECT ROWNUM - 1 AS RN
> FROM DUAL
> CONNECT BY LEVEL <= 18) X
> ON CP.CPCL_NBR - X.RN >= 4
> WHERE CP.PN_CD IN ('XX', 'YY'))
> SELECT
> ROW_NUMBER () OVER (PARTITION BY T.DS_NO, T.C_NBR ORDER BY C.CPCL_NBR
> DESC) AS CP_RANK,
> COUNT (*) OVER ( PARTITION BY T.DS_NO, T.C_NBR,C.CPCL_NBR) AS
> CL_MATCHES,
> COUNT (*) OVER (PARTITION BY T.DS_NO, T.C_NBR) AS NO_MATCHES
> FROM (select T.DS_NO, T.PM_FK, T.AC_FK, T.RT_FK, T.CT_FK.....
> from USER1.T T
> WHERE T.HS_DATE_TIME = to_date(:B1,'MM/DD/YYYY HH24:MI:SS')
> ) T
> LEFT OUTER JOIN C
> ON SUBSTR (T.C_NBR, 1, 4) = C.BI_NBR
> AND LENGTH (T.C_NBR) = C.PANL_CNT
> AND T.C_NBR BETWEEN C.LOW_VALUE AND C.HIGH_VALUE;
>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Nov 11 2022 - 11:07:18 CET

Original text of this message