Re: Same query with no plan change or volume but runs for hours vs minutes

From: Lok P <loknath.73_at_gmail.com>
Date: Tue, 28 Nov 2023 18:57:53 +0530
Message-ID: <CAKna9VbuPiTFdVa01K=2eotW=TGG8aexsMuE_Oe84+mVSGAAag_at_mail.gmail.com>



Yes its 18 Billion buffer gets for the bad run (Thats too when its half way i.e. ~18million rows processed) vs ~296million for the good run (with all the ~31million rows processed). And that is the difference resulting into purely the additional CPU time.

If we look into the ~5minutes delta stats from v$sesstat and mainly the ASSM related stats :-
for the bad run "ASSM cbk:blocks examined" and "ASSM cbk:blocks marked full" is ~184million and ~4600 respectively. VS good run is having "ASSM cbk:blocks examined" and "ASSM cbk:blocks marked full" as ~12million and ~214K. So does this mean that during the bad run Oracle was doing lot fo work or scanning lot of blocks and finding lot of partially filled blocks and that was causing such a big difference in response time? What can cause such thing?

On Tue, 28 Nov, 2023, 6:23 pm Lok P, <loknath.73_at_gmail.com> wrote:

> Thank you for the response.
>
> My Apology. I forgot to mention its version 19C (19.19.0.0.0) of oracle on
> exadata.
>
> There exists only one composite (TXN_ID, WOD_DT) primary key index on this
> table-PBDRA_DTL. This table is daily range partitioned on a date column.
> And also the sequence -PBDRAD_SEQ does fed into the leading column(TXN_ID)
> of the PK index.
>
> The AVG_ROW_LEN i see in dba_tab_partition for last few of the partitions
> are showing ~198 and similar for others too.
>
> PARTITION_NAME NUM_ROWS BLOCKS AVG_ROW_LEN COMPRESSION
> PBDRA_DTL_11272023_P 33734616 1487844 197 DISABLED
> PBDRA_DTL_11262023_P 33469912 980070 198 DISABLED
> PBDRA_DTL_11252023_P 33527112 994892 198 DISABLED
> PBDRA_DTL_11242023_P 33948522 991827 198 DISABLED
> PBDRA_DTL_11232023_P 32667974 993492 198 DISABLED
> PBDRA_DTL_11222023_P 34653004 1529017 198 DISABLED
> PBDRA_DTL_11212023_P 33233838 968435 198 DISABLED
> PBDRA_DTL_11202023_P 32870086 1472631 198 DISABLED
> PBDRA_DTL_11192023_P 32485084 947193 198 DISABLED
>
> Select list doesnt have any plsql function.
>
> Today I have captured the sql monitor and related stats from v$sesstat for
> a ~5minutes delta period for a quick/good run in which the complete query
> finished in ~15minutes. I have posted the sql monitor and stats in same
> github link, The session stats i have posted as comments to the same github
> linc.
>
> https://gist.github.com/oraclelearner/722a77ef5ebce23396d82e2835958303
>
> When you mentioned "*some catastrophic activity on the base table might
> cause the ASSM related stats*", are you pointing to delete or anything?
> Because we normally dont performa DELETE but just use partition DROP for
> data purge on these tables. Below is the stats specific to ASSM i do
> observe , and those i have not noticed in past frequently. And also as we
> encounter this slow run intermittently, so not sure if we are encountering
> any bugs related to ASSM here?
>
> ***** BAD RUN******
>
> ASSM cbk:blocks examined 184273393
> ASSM cbk:blocks marked full 4610
> ASSM gsp:bump HWM 2
> ASSM gsp:get free block 46150
> ASSM gsp:good hint 36942
> ASSM gsp:L1 bitmaps examined 42601388
> ASSM gsp:L2 bitmap full 2
> ASSM gsp:L2 bitmaps examined 151006
> ASSM gsp:Optimized index block rejects 438092544
> ASSM gsp:Optimized reject DB 36692008
> ASSM gsp:Optimized reject l1 1150
> ASSM gsp:reject db 622315351
> ASSM gsp:reject L1 462672
> ASSM gsp:Search all 2
> ASSM gsp:Search hint 35
> ASSM gsp:Search steal 2
> blocks decrypted 41626
> branch node splits 8
> buffer is not pinned count 1424244
> buffer is pinned count 296
> calls to get snapshot scn: kcmgss 1429118
>
> ***** GOOD RUN*****
>
> ASSM cbk:blocks examined 12188141
> ASSM cbk:blocks marked full 214458
> ASSM gsp:bump HWM 37
> ASSM gsp:get free block 405327
> ASSM gsp:good hint 405308
> ASSM gsp:L1 bitmaps examined 21220340
> ASSM gsp:L2 bitmap full 64
> ASSM gsp:L2 bitmaps examined 9006
> ASSM gsp:L3 bitmaps examined 54
> ASSM gsp:Optimized reject DB 185267698
> ASSM gsp:Optimized reject l1 124657
> ASSM gsp:Optimized reject l2 27
> ASSM gsp:reject db 11568356
> ASSM gsp:reject L1 13730
> ASSM gsp:reject L2 27
> ASSM gsp:Search all 37
> ASSM gsp:Search hint 402
> ASSM gsp:Search steal 37
> blocks decrypted 125480
> branch node splits 77
> buffer is not pinned count 12752925
> buffer is pinned count 2643
> calls to get snapshot scn: kcmgss 12796284
>
> *******
>
> The tablespaces holding the table and index partitions are all TDE
> encrypted.
>
> The logic inside FUN1 looks something as below. It does execute sql inside
> it but we have not changed anything in this logic recently. The table pmtd
> having ~1.8million rows and having size ~497MB. Its using index for
> accessing that table inside this function. Dont see a change in input data
> pattern for this function in both the days.
>
> FUNCTION Fun1 (cnum IN VARCHAR2, pnum IN VARCHAR2)
> RETURN VARCHAR2
> IS
> ret VARCHAR2(20) := 'XX';
> dicount NUMBER := null;
> BEGIN
> IF (cnum = '1111')
> THEN
> ret := 'XX';
> ELSE IF (cnum is null)
> THEN
> select count(*) into dicount from pmtd td
> where td.did = pnum AND td.cid in (999999);
> IF(dicount > 0) THEN ret := 'Y';
> ELSE
> ret := 'N';
> END IF;
> END IF;
> END IF;
> RETURN ret;
> END;
>
>
> On Tue, 28 Nov, 2023, 4:58 pm Dominic Brooks, <dombrooks_at_hotmail.com>
> wrote:
>
>> It's all about the buffer gets.
>> I would think this is probably a side-effect of some other processing.
>>
>> 316M compared to 18G - only a few of which require actual physical reads.
>>
>> But the answer should be in the session statistics for the two respective
>> sessions.
>> 5 minutes sample stats is not a lot from 6.5 hours.
>> I would be looking at AWR, ASH and application logging to see if there's
>> any correlation to any other processing involving the same objects
>> beforehand some times.
>> Otherwise it's the full session stats for the session which is going to
>> tell you the WHY. Although statistics are only part of the answer. If you
>> could snap before/after execution of this SQL (from the code for example),
>> that should help you find out the high level category of what is being done.
>> An extended sql trace for this sql id would also help answer.
>> ------------------------------
>> *From:* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on
>> behalf of Jonathan Lewis <jlewisoracle_at_gmail.com>
>> *Sent:* 28 November 2023 10:11
>> *To:* Oracle L <oracle-l_at_freelists.org>
>> *Subject:* Re: Same query with no plan change or volume but runs for
>> hours vs minutes
>>
>> Which version of Oracle?
>> What's the average row length of the rows inserted into the table?
>> How many indexes on the table - is it just one on the sequence-generated
>> column?
>> Does the "fun1()" function in the where clause execute any SQL?
>> Does the select list include any pl/sql functions that execute any SQL?
>>
>> There are two possible places suggested by what you've told us that might
>> be where the excess CPU is used.
>>
>> 1) In some PL/SQL function code that either executes very late in the
>> statement, or execute fairly early but gets reported in the wrong place
>>
>> 2) the amount of time Oracle spends check the ASSM bitmaps for free space
>> has exploded - possible because of some catastrophic activity in the table
>> that happened before this insert started.
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>>
>>
>> On Mon, 27 Nov 2023 at 20:15, Lok P <loknath.73_at_gmail.com> wrote:
>>
>> Hi All,
>>
>> The insert query normally runs for <15minutes somedays runs for ~6-7hrs
>> with exactly the same plan and approx same volume. And all the resources
>> seem to be spent on step "LOAD TABLE CONVENTIONAL" and its "ON CPU".
>> Wondering where it's spending time during the long execution?
>>
>> While the long run was happening , we tried to fetch the stats from
>> v$sesstat for a ~5minutes duration and collected the non zero stats. Not
>> able to find out anything odd. Could you please guide me here, what must be
>> the cause of this sudden long execution on some days for this same query
>> with almost the same volume?
>>
>> I have posted the sql monitors from both the runs below and also in the
>> below github linc. And also published the nonzero output from the v$sesstat
>> for the long execution for a ~5 minutes delta duration.We do see some stats
>> like "KTFB alloc**" , is that normal or some issue is causing these
>> stats to pop up?
>>
>> https://gist.github.com/oraclelearner/722a77ef5ebce23396d82e2835958303
>>
>> ******** Sql monitor from long run ********
>> Global Information
>> ------------------------------
>> Status : EXECUTING
>> Instance ID : 1
>> SQL Execution ID : 16777216
>> Execution Started : 11/27/2023 03:30:16
>> First Refresh Time : 11/27/2023 03:30:19
>> Last Refresh Time : 11/27/2023 10:06:13
>> Duration : 23759s
>> PLSQL Entry Ids (Object/Subprogram) : 161298,9
>> PLSQL Current Ids (Object/Subprogram) : 161298,15
>>
>> Global Stats
>>
>> ===========================================================================================================================================================================
>> | Elapsed | Cpu | IO | Application | Concurrency | Cluster |
>> PL/SQL | Other | Buffer | Read | Read | Uncompressed | Offload |
>> Offload | Cell |
>> | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Waits(s) |
>> Time(s) | Waits(s) | Gets | Reqs | Bytes | Bytes | Elig Bytes |
>> Returned Bytes | Offload |
>>
>> ===========================================================================================================================================================================
>> | 23758 | 23570 | 4.88 | 1.28 | 4.78 | 58 |
>> 41 | 119 | 18G | 1M | 719GB | 359GB | 715GB |
>> 4GB | 99.42% |
>>
>> ===========================================================================================================================================================================
>>
>> SQL Plan Monitoring Details (Plan Hash Value=557779869)
>>
>> ==================================================================================================================================================================================================================================
>> | Id | Operation | Name
>> | Rows | Cost | Time | Start | Execs | Rows | Read | Read |
>> Mem | Activity | Activity Detail |
>> Progress |
>> | | |
>> | (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes |
>> | (%) | (# samples) |
>> |
>>
>> ==================================================================================================================================================================================================================================
>> | -> 0 | INSERT STATEMENT |
>> | | | 23761 | +3 | 1 | 0 | | |
>> . | 0.16 | Cpu (35) |
>> |
>> | | |
>> | | | | | | | | |
>> | | cell smart table scan (2) |
>> |
>> | -> 1 | LOAD TABLE CONVENTIONAL | PBDRA_DTL
>> | | | 23762 | +2 | 1 | 0 | 499K | 4GB |
>> . | 98.42 | gc current block 2-way (4) |
>> |
>> | | |
>> | | | | | | | | |
>> | | gc current grant 2-way (23) |
>> |
>> | | |
>> | | | | | | | | |
>> | | gc current grant busy (18) |
>> |
>> | | |
>> | | | | | | | | |
>> | | latch: cache buffers chains (5) |
>> |
>> | | |
>> | | | | | | | | |
>> | | log file switch completion (1) |
>> |
>> | | |
>> | | | | | | | | |
>> | | undo segment extension (2) |
>> |
>> | | |
>> | | | | | | | | |
>> | | Cpu (22695) |
>> |
>> | | |
>> | | | | | | | | |
>> | | cell single block physical read: RDMA (8) |
>> |
>> | | |
>> | | | | | | | | |
>> | | cell single block physical read: xrmem cache (34) |
>> |
>> | | |
>> | | | | | | | | |
>> | | enq: FB - contention (1) |
>> |
>> | | |
>> | | | | | | | | |
>> | | ges message buffer allocation (2) |
>> |
>> | | |
>> | | | | | | | | |
>> | | cell single block physical read: flash cache (3) |
>> |
>> | -> 2 | SEQUENCE | PBDRA_SEQ
>> | | | 23761 | +3 | 1 | 18M | | |
>> . | 0.50 | Cpu (109) |
>> |
>> | | |
>> | | | | | | | | |
>> | | reliable message (6) |
>> |
>> | -> 3 | HASH JOIN |
>> | 68259 | 1M | 23761 | +3 | 1 | 18M | | |
>> 5MB | 0.02 | Cpu (5) |
>> |
>> | 4 | TABLE ACCESS STORAGE FULL | PRFS
>> | 78 | 2 | 1 | +3 | 1 | 63 | 1 | 40960 |
>> . | | |
>> |
>> | -> 5 | PARTITION RANGE ITERATOR |
>> | 152K | 1M | 23761 | +3 | 1 | 18M | | |
>> . | | |
>> |
>> | -> 6 | TABLE ACCESS STORAGE FULL | PBABR_DTL
>> | 152K | 1M | 23761 | +3 | 215 | 18M | 740K | 715GB |
>> 7MB | 0.33 | Cpu (73) |
>> 100% |
>> | | |
>> | | | | | | | | |
>> | | gcs drm freeze in enter server mode (1) |
>> |
>> | | |
>> | | | | | | | | |
>> | | reliable message (1) |
>> |
>> | | |
>> | | | | | | | | |
>> | | cell smart table scan (1) |
>> |
>>
>> ==================================================================================================================================================================================================================================
>>
>>
>> ********* Quick Run ****************
>>
>> Global Information
>> ------------------------------
>> Status : DONE
>> Instance ID : 1
>> SQL Execution ID : 16777216
>> Execution Started : 11/24/2023 04:39:04
>> First Refresh Time : 11/24/2023 04:39:07
>> Last Refresh Time : 11/24/2023 04:55:37
>> Duration : 993s
>> PLSQL Entry Ids (Object/Subprogram) : 161298,9
>> PLSQL Current Ids (Object/Subprogram) : 161298,15
>>
>> Global Stats
>>
>> ===========================================================================================================================================================================
>> | Elapsed | Cpu | IO | Application | Concurrency | Cluster |
>> PL/SQL | Other | Buffer | Read | Read | Uncompressed | Offload |
>> Offload | Cell |
>> | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Waits(s) |
>> Time(s) | Waits(s) | Gets | Reqs | Bytes | Bytes | Elig Bytes |
>> Returned Bytes | Offload |
>>
>> ===========================================================================================================================================================================
>> | 993 | 942 | 5.27 | 0.05 | 0.01 | 0.02 |
>> 68 | 46 | 316M | 2M | 867GB | 443GB | 861GB |
>> 7GB | 99.21% |
>>
>> ===========================================================================================================================================================================
>>
>> SQL Plan Monitoring Details (Plan Hash Value=557779869)
>>
>> =====================================================================================================================================================================================
>> | Id | Operation | Name |
>> Rows | Cost | Time | Start | Execs | Rows | Read | Read |
>> Mem | Activity | Activity Detail |
>> | | | |
>> (Estim) | | Active(s) | Active | | (Actual) | Reqs | Bytes |
>> (Max) | (%) | (# samples) |
>>
>> =====================================================================================================================================================================================
>> | 0 | INSERT STATEMENT | |
>> | | 991 | +3 | 1 | 0 | | |
>> . | | |
>> | 1 | LOAD TABLE CONVENTIONAL | PBDRA_DTL |
>> | | 991 | +3 | 1 | 0 | 874K | 7GB |
>> . | | |
>> | 2 | SEQUENCE | PBDRA_SEQ |
>> | | 992 | +2 | 1 | 30M | | |
>> . | | |
>> | 3 | HASH JOIN | |
>> 66851 | 1M | 991 | +3 | 1 | 30M | | |
>> 5MB | | |
>> | 4 | TABLE ACCESS STORAGE FULL | PRFS |
>> 78 | 2 | 1 | +3 | 1 | 63 | 2 | 49152 |
>> 1MB | | |
>> | 5 | PARTITION RANGE ITERATOR | |
>> 149K | 1M | 991 | +3 | 1 | 32M | | |
>> . | | |
>> | 6 | TABLE ACCESS STORAGE FULL | PBABR_DTL |
>> 149K | 1M | 991 | +3 | 286 | 32M | 892K | 861GB |
>> 7MB | | |
>>
>> =====================================================================================================================================================================================
>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Nov 28 2023 - 14:27:53 CET

Original text of this message