Re: Same query with no plan change or volume but runs for hours vs minutes

From: Chris Taylor <christopherdtaylor1994_at_gmail.com>
Date: Tue, 28 Nov 2023 15:17:12 -0600
Message-ID: <CAP79kiT1ZL5eH5gF8OmtG_QFyjW8hBwhERvu6NBDx6qUjRv2zQ_at_mail.gmail.com>



One thing we ran into on 12.1 (I know you're on 19.x) with something similar was the table stored in a table space with many data files - all with auto extend on. The problem in our case was many of the data files were full (or practically full) and causing some weird behavior as Oracle searched for space.

Oracles recommendation to us at the time was to turn off auto extend on all the full/nearly full data files and ultimately start moving objects into a big file table space.
So, perhaps if you have many data files (or one or many data files with auto extend on), you could try disabling auto extend on those.

Problem here is that it's not an obvious fix unless it never happens again - in our case we never saw it happen again. I caution though, I don't have the technical details from our case to compare to your case but I wanted to throw it out there if it rings any bells with anyone.

Chris

On Tue, Nov 28, 2023, 12:07 PM Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:

>
> an index on (large sequence value, date_value) should store about 310 to
> 320 index entries per leaf block when full.
> So the leaf node splits suggest you've inserted roughly 4602 * 310 = 1.4M
> rows in the bad case and 40928 * 310 = 12.7M rows in the good case.
> These figures are in line with the execute count (etc.) of 1424230
> and 12752800. This suggests that virtually every row that gets through the
> other join predicates executes the select statement in fun1 and survives
> the predicate check.
>
> The "index range scans" statistic match the "execution count" statistic in
> both cases, and the various figures for consistent gets suggest that the
> access path is the same in both cases (and is index only-only). So you can
> discount something odd happening with the function.
>
> Take the execute count as the number of rows inserted
> Take "ASSM gsp: get free block" - "leaf node splits" - "branch node
> splits" as the number of new blocks requested for the table.
> Divide "rows inserted" by "new table blocks" and the answer is about
> 34.5: multiply that by 203 (avg_row_len + 5 for the row overhead) and the
> answer is about 7,000 - so the figures about data size etc. hang together
> quite well, and make it reasonable to look for the problem in "other
> activity">
>
> So in the bad case the session requests some ASSM space for inserting
> table rows 41,500 times (46,150 - 4,600). To find that space is examines
> To find that space Oracle examines 42.6M Level 1 bitmap blocks: (ASSM
> gsp:L1 bitmaps examined 42601388)
> and 184M data blocks: (ASSM cbk:blocks examined 184273393)
> Of these(!), Oracle reports 633M rejected (ASSM gsp:reject db
> 622315351)
>
> I think the inconsistency in the examined/rejected tells us that there's
> an error somewhere in how Oracle is totalling up different subsets, and
> there are lots of ways you can kick around all the examine/reject figures
> to try and guess what they mean and how they've been summed incorrectly.
>
> What I do think, though, is that the 184M blocks examined, and the 42M L1
> bitmap blocks example are where your code is burning its excess CPU in the
> slow case -- checking blocks which aren't flagged as FULL, but can't accept
> the data that needs to be inserted. In passing 184M + 42M = 226M, which is
> most of the session logical I/O.
>
>
> When I asked about a "catastrophic" event, I wanted you to think whether
> there was any large-scale activity you might have done in the recent past.
> If I had wanted to know if you had done a big delete I would have asked if
> you had done a big delete. We need to find a reason why there are lots of
> blocks that seem to be marked as "space available" in the L1 bitmap, but
> aren't usable once Oracle gets to the block. (At some point in the past
> Oracle Corp. added code for a session to record a list of blocks that it
> had visited unsuccessfully "a few" times so that it stopped visiting them,
> but that may have changed since the initial implementation.)
>
> You could use the dbms_space.space_usage procedure to check the state of
> the blocks in one of the target partitions to see if there are lots of
> blocks that between 75 and 10% full. This might be a bit of information to
> pass to Oracle.
>
>
> Regards
> Jonathan Lewis
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>> There exists only one composite (TXN_ID, WOD_DT) primary key index on
>> this table-PBDRA_DTL. This table is daily range partitioned on a date
>> column. And also the sequence -PBDRAD_SEQ does fed into the leading
>> column(TXN_ID) of the PK index.
>>
>>
>
>> The AVG_ROW_LEN i see in dba_tab_partition for last few of the partitions
>> are showing ~198 and similar for others too.
>>
>> PARTITION_NAME NUM_ROWS BLOCKS AVG_ROW_LEN COMPRESSION
>> PBDRA_DTL_11272023_P 33734616 1487844 197 DISABLED
>> PBDRA_DTL_11262023_P 33469912 980070 198 DISABLED
>> PBDRA_DTL_11252023_P 33527112 994892 198 DISABLED
>> PBDRA_DTL_11242023_P 33948522 991827 198 DISABLED
>> PBDRA_DTL_11232023_P 32667974 993492 198 DISABLED
>> PBDRA_DTL_11222023_P 34653004 1529017 198 DISABLED
>> PBDRA_DTL_11212023_P 33233838 968435 198 DISABLED
>> PBDRA_DTL_11202023_P 32870086 1472631 198 DISABLED
>> PBDRA_DTL_11192023_P 32485084 947193 198 DISABLED
>>
>> Select list doesnt have any plsql function.
>>
>> Today I have captured the sql monitor and related stats from v$sesstat
>> for a ~5minutes delta period for a quick/good run in which the complete
>> query finished in ~15minutes. I have posted the sql monitor and stats in
>> same github link, The session stats i have posted as comments to the same
>> github linc.
>>
>> https://gist.github.com/oraclelearner/722a77ef5ebce23396d82e2835958303
>>
>> When you mentioned "*some catastrophic activity on the base table might
>> cause the ASSM related stats*", are you pointing to delete or anything?
>> Because we normally dont performa DELETE but just use partition DROP for
>> data purge on these tables. Below is the stats specific to ASSM i do
>> observe , and those i have not noticed in past frequently. And also as we
>> encounter this slow run intermittently, so not sure if we are encountering
>> any bugs related to ASSM here?
>>
>>>
>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Nov 28 2023 - 22:17:12 CET

Original text of this message