Re: Dash_wait_chain script resulting into ORA-01489

From: Peter Hitchman <pjhoraclel_at_gmail.com>
Date: Wed, 8 Mar 2023 12:03:39 +0000
Message-ID: <CAPMSPxPhEJTvbrs9ejc9-hK-BixL7uHd1YneCXbfi8t40f7QZA_at_mail.gmail.com>



Hi
The wait event to the far right of the wait chain is the ultimate blocker.

I found on MOS this bug:
*Bug 33541865 BCT: High "block change tracking buffer space" wait event* for release 19.12, fixed in a 19.17 release update.

You could easily be looking at multiple issues here: I/O config, Oracle bug, application implementation of DML (as Pap pointed out). So the trick is to figure out what is having the biggest negative impact and change that and measure the impact.

Yes, with the database catching up on database changes created by APP1 there will be an impact on APP2.

Regards
Pete

On Wed, 8 Mar 2023 at 11:32, yudhi s <learnerdatabase99_at_gmail.com> wrote:

> I was bit confused in understanding the pattern here. Who is the cause and
> who is the effect here. The wait chain script showing , the main blocking
> session here were "db writer", "block change tracking", 'log fileswitch
> checkpoint incomplete' , so are these because of very high number of
> session getting submitted at same time from application side or is it
> because as you mentioned the DBWR process was unable to serve the incoming
> sessions and thus should be increased?
>
> And also i see another pattern here i.e. after the number of active
> sessions increased to very high number for one particular application(say
> APP1) and becomes back to normal within next 20 seconds. But post then ,
> after a minute , another application(say APP2) starts suffering and its
> session starts piling up. But as per team for this application APP2, the
> number of incoming sessions are normal/same through out the day. And the
> ash wait chain out put for this APP2 duration also showing same type of
> events as blocking sessions in the ASH wait chain query. So is it that ,
> the database is trying to do some work in delayed fashion from the first
> occurrence of the high number of sessions(doing mostly DML) from first
> application- APP1? and thus we are seeing after a minute the sessions from
> other application were getting impacted?
>
>
> On Mon, 6 Mar, 2023, 4:01 pm Peter Hitchman, <pjhoraclel_at_gmail.com> wrote:
>
>> Hi
>> I have no experience of Exadata, but to benefit from asynchronous I/O,
>> filesystemio_options usually needs to be set to "SETALL" or "ASYNC", unless
>> Exadata does asynchronous I/O regardless.
>>
>> Or maybe you need more database writers to be able make use of the I/O
>> subsystem's sull capacity.
>>
>> Regards
>> Pete
>>
>> On Sun, 5 Mar 2023 at 06:07, Pap <oracle.developer35_at_gmail.com> wrote:
>>
>>> So as you rightly said it's just not the 'DBWR' process related waits
>>> but you are seeing lot of 'log file checkpoint incomplete' and 'block
>>> change tracking waits' all over the ASH report, it may means that you are
>>> flooding the database with unexpected amount of DMLS concurrently which the
>>> database is not able to handle. You may try to see if these are all row by
>>> row DMLS spawning from multiple sessions at same time and which can be
>>> converted to bulk DML to alleviate the chatter at DB side for that
>>> interval of time.
>>>
>>> On Sat, Mar 4, 2023 at 4:26 PM yudhi s <learnerdatabase99_at_gmail.com>
>>> wrote:
>>>
>>>> Tried to capture the details from gv$active_session_history using
>>>> Tanels "ash_wait_chains.sql" script during issue period from three
>>>> different intervals i.e 5seconds, 10 seconds and 20 seconds and published
>>>> in below link. This time the script didn't fail with the Ora-01489.
>>>>
>>>>
>>>> https://gist.github.com/databasetech0073/3b8c9549ed9825a9888c8da1c74e60ac
>>>>
>>>> Most of those are pointing to DBWR process being the main blocker along
>>>> with some 'log file switch checkpoint incomplete' and 'block change
>>>> tracking' waita. So does it mean that the database writer is slow to catch
>>>> up with the amount of DMLS which the application submits during this
>>>> interval?
>>>>
>>>>
>>>>
>>>> On Sat, Mar 4, 2023 at 12:19 AM yudhi s <learnerdatabase99_at_gmail.com>
>>>> wrote:
>>>>
>>>>> Thank You Pete. We are not seeing any signs of IO spike or Disk
>>>>> utilization being saturated in the storage cells during this time. Below
>>>>> are the database details.
>>>>>
>>>>> It's Oracle 19c. And Exadata machine.
>>>>> Disk_asynch_io - True, filesystemio_options-None.
>>>>> Platform- linux X86 64bit, CPU - 96, Cores- 48, Sockets-2.
>>>>>
>>>>> And also just to note , as I am unable to fetch the dash_wait_chain
>>>>> from during the exact peak period as it is failing with error 'result
>>>>> concatenation is too long' , so not sure if the above
>>>>> supplied dash_wait_chain output from immediate before the issue period is
>>>>> reliable to what extent.
>>>>>
>>>>>
>>>>> On Fri, Mar 3, 2023 at 10:16 PM Peter Hitchman <pjhoraclel_at_gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>> I think some more detail about your set-up will help to get some
>>>>>> answers:
>>>>>> Oracle version
>>>>>> Operating system
>>>>>> Number of CPUs
>>>>>> and
>>>>>> The database I/O configuration (disk_asynch_io and
>>>>>> filesystemio_options) and the set-up of the storage being used.
>>>>>>
>>>>>> I think that this points to the I/O subsystem not being able to keep
>>>>>> up with the load it is being given.
>>>>>>
>>>>>> Regards
>>>>>> Pete
>>>>>>
>>>>>
>>
>> --
>> Regards
>>
>> Pete
>>
>

-- 
Regards

Pete

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Mar 08 2023 - 13:03:39 CET

Original text of this message