Re: How to debug resource busy error

From: Pap <oracle.developer35_at_gmail.com>
Date: Tue, 3 May 2022 10:11:34 +0530
Message-ID: <CAEjw_fj8A_67mA4BrrUjGNOq+r4Hp+rJgetHwq7Em0+NhYthUQ_at_mail.gmail.com>



Thank you so much.

I never thought of this scenario. Though the parsing query has not been seen/captured in the dba_hist_active_sess_history and also we were not able to see anything in v$locked_object, however I am sure we do have a lot of select queries from reporting module running during the same time on that object. so it may happen they are not captured as parsing was happening too quick and "Alter" was unlucky to clash with one of them.

I was neglecting that fact and was only thinking that a DML/DDL on the base object can cause this failure for this simultaneous ALTER statement but now as you mentioned a parsing lock from a SELECT query, on the base object can well cause failure of the ALTER statement with Ora-0054. And if this is the case then ddl_lock_timeout of 2-3minutes should be enough to bypass this issue. Want to know, is there any ash/awr view exists, which logs/captures the parsing related lock on the base object so that we can query it by the object_id?

Regards
Pap

On Tue, May 3, 2022 at 1:46 AM Laurentiu Oprea <laurentiu.oprea06_at_gmail.com> wrote:

> Thanks. Is strange I don't see any query in parsing phase here based on
> the chains output. The theory in my mind was that you have a parallel query
> (can be even a select) for which parsing takes significant time and which
> include the table you want to enable the constraints thus blocking them but
> looks like is not the case so tracing should be your friend here.
>
> Good luck.
>
> On Mon, May 2, 2022, 22:29 Pap <oracle.developer35_at_gmail.com> wrote:
>
>> Thank You.
>>
>> _at_Laurentiu, I have executed the dashtop query with the suggested inputs
>> and below is the output from that. But I can't see any query in the output
>> which can take such a lock on the base table and can cause this error.
>>
>> _at_Stefan, Considering future course of action, i was planning to set the
>> ddl_lock_timeout in the procedure code through the alter session
>> command. However as you mentioned that culprit session can well be
>> captured using trace. So i was wondering , if any of the below tracing
>> is enough to catch the culprit session/sql details or we should
>> specifically go for "systemstate" dump i.e the one as you suggested? I
>> never generated or read the systemstate dump, so trying to understand if
>> other simple traces as below will also help catching the culprit
>> sessions/sqls etc?
>>
>> alter system set events "00054 trace name systemstate level 266, lifetime
>> 1";
>>
>> Get the trace....
>>
>> ALTER system SET EVENTS '00054 TRACE NAME CONTEXT OFF';
>>
>> ****OR
>>
>> ALTER system SET EVENTS '00054 trace name errorstack level 3';
>>
>> Get the trace ...
>>
>> ALTER system SET EVENTS '00054 TRACE NAME CONTEXT OFF';
>>
>> ****OR tracing the exact session
>>
>> begin
>>
>> dbms_monitor.session_trace_enable( session_id => &m_sid, serial_num
>> => &m_serial,
>>
>> waits => true, bind => true, plan_stat => 'all_executions'
>> );
>>
>> end;
>>
>> /
>>
>> -- allow the Ora-00054 failure to happen for the session
>>
>> begin
>>
>> dbms_monitor.session_trace_disable(session_id => &m_sid,serial_num =>
>> &m_serial);
>>
>> end;
>>
>> /
>>
>>
>> [image: User: "image.png"]
>>
>>
>>
>>
>>
>>
>>
>>
>> On Mon, May 2, 2022 at 6:39 PM Laurentiu Oprea <
>> laurentiu.oprea06_at_gmail.com> wrote:
>>
>>> Hello Pap,
>>>
>>> I do see a probable cause for your error so before doing a guess
>>> completely on feeling, lets get some data :
>>>
>>> check the data of: _at_dashtop
>>> sql_opname,sql_id,sql_exec_start,in_parse session_type='FOREGROUND'
>>> starttime endtime
>>> https://github.com/tanelpoder/tpt-oracle/blob/master/ash/dashtop.sql
>>> BTW: endtime should be the moment you receive the error, starttime can
>>> be 15 minutes before
>>>
>>> At least I`ll make a guess based on feeling + some data :)
>>>
>>> În lun., 2 mai 2022 la 14:16, Pap <oracle.developer35_at_gmail.com> a
>>> scris:
>>>
>>>> Thank you so much Laurentiu and Dominic.
>>>>
>>>> I tried checking the dash_wait_chains script for 15minutes window ,
>>>> just by passing first parameter as session_type='FOREGROUND' and but not
>>>> seeing much blocking waits as in below output. And also as i don't have the
>>>> sql_id for the exact ALTER statement captured as it must be very fast and
>>>> thus not getting captured in ASH/AWR, so was not able to pass the second
>>>> parameter i.e sql_id filter.
>>>>
>>>> To Dominic's point , i think this ALTER sql_id wont be captured as its
>>>> very fast and failing immediately when its seeing the lock from other
>>>> session. And also we are struggling to catch the culprit session too. So as
>>>> suggested, will try to set the ddl_lock_timeout to ~300 i.e 5 minutes so as
>>>> to capture enough blocking session against the ALTER statement logged in
>>>> the ASH/AWR to get the details around the culprit session. And considering
>>>> the data-load itself take ~1 hrs+ so this amount of additional wait time(5
>>>> mins) should be fine. That should possibly also fix the issue if the
>>>> blocking lock/culprit session is just holding it for very small time and
>>>> releasing it immediately after.
>>>>
>>>>
>>>> [image: User: "image.png"]
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, 2 May 2022, 12:03 pm Dominic Brooks, <dombrooks_at_hotmail.com>
>>>> wrote:
>>>>
>>>>> Problem is that the error is likely to be immediate.
>>>>>
>>>>> If you increase/set the param ddl_lock_timeout in your load code
>>>>> before you do enable constraints then
>>>>> a) you might wait sufficient time not to fail, ie your blocker
>>>>> finishes before the timeout end and/or
>>>>> b) you should capture some information in ASH about the blocking
>>>>> scenario.
>>>>>
>>>>> Cheers,
>>>>> Dominic
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On 2 May 2022, at 06:55, Laurentiu Oprea <laurentiu.oprea06_at_gmail.com>
>>>>> wrote:
>>>>>
>>>>> 
>>>>> Hello Pap,
>>>>>
>>>>> To diagnose your issue ASH should do the trick.
>>>>>
>>>>> Here is an example(script dash_wait_chains) :
>>>>>
>>>>> https://tanelpoder.com/2013/11/06/diagnosing-buffer-busy-waits-with-the-ash_wait_chains-sql-script-v0-2/
>>>>> <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftanelpoder.com%2F2013%2F11%2F06%2Fdiagnosing-buffer-busy-waits-with-the-ash_wait_chains-sql-script-v0-2%2F&data=05%7C01%7C%7C97ae9142e13241b6ef5308da2c005496%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637870677178768226%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RXOJ6ean3%2FHVZiEqVguXpBwUZWt7sQi4OjPAPTmACX8%3D&reserved=0>
>>>>>
>>>>> Thank you.
>>>>>
>>>>> În lun., 2 mai 2022 la 05:58, Pap <oracle.developer35_at_gmail.com> a
>>>>> scris:
>>>>>
>>>>>> Hi, We are facing Ora-0054 error on specific time of the month for a
>>>>>> job and from the line number noted in the error log its pointing to the
>>>>>> alter statement which enables reference constraints after a direct path
>>>>>> data load and commit. But we tried monitoring it a few times during
>>>>>> the error period/run time and surprisingly we are not seeing any lock(from
>>>>>> v$locked_object) on the base table(say e.g. MAIN_TAB in below example) but
>>>>>> still the job failed and it succeeded after a couple of rerun
>>>>>> attempts. But it seems like , some process runs and takes lock(for
>>>>>> may be very small time) and makes this 'ALTER' failure and we were
>>>>>> unable to find that out.
>>>>>>
>>>>>> So I wanted to understand , how to get the culprit session/sql/job so
>>>>>> that we can have some dependency set so that this failure won't
>>>>>> occur? Is there any possible trace to set which will help us to debug this
>>>>>> scenario and get the details around the culprit session? Or somehow
>>>>>> we can track it from any of the ASH/AWR views for which we have
>>>>>> enough retention in place?
>>>>>>
>>>>>> *****The error log pointing to one of the below statements
>>>>>> failure inside the procedure *******
>>>>>>
>>>>>> ALTER TABLE SCHEMA1.MAIN_TAB ENABLE NOVALIDATE CONSTRAINT
>>>>>> MAIN_TAB_R01;
>>>>>>
>>>>>> ALTER TABLE SCHEMA1.MAIN_TAB ENABLE NOVALIDATE CONSTRAINT
>>>>>> MAIN_TAB_R03;
>>>>>>
>>>>>> ORA-00054: resource busy and acquire with NOWAIT specified or timeout
>>>>>> expired
>>>>>>
>>>>>> ********Table structure***************
>>>>>>
>>>>>> CREATE TABLE SCHEMA1.MAIN_TAB
>>>>>>
>>>>>> ( FFKEY NUMBER CONSTRAINT MAIN_TAB_C01 NOT NULL,
>>>>>>
>>>>>> SCKEY NUMBER CONSTRAINT MAIN_TAB_C02 NOT NULL,
>>>>>>
>>>>>> ACKEY NUMBER,
>>>>>>
>>>>>> CKEY NUMBER CONSTRAINT MAIN_TAB_C04 NOT NULL,
>>>>>>
>>>>>> CSTKEY NUMBER CONSTRAINT MAIN_TAB_C05 NOT NULL,
>>>>>>
>>>>>> EPC_KEY NUMBER CONSTRAINT MAIN_TAB_C06 NOT NULL,
>>>>>>
>>>>>> CRRKEY NUMBER CONSTRAINT MAIN_TAB_C07 NOT NULL
>>>>>>
>>>>>> );
>>>>>>
>>>>>> ALTER TABLE SCHEMA1.MAIN_TAB ADD ( CONSTRAINT MAIN_TAB_R01 FOREIGN
>>>>>> KEY (CKEY) REFERENCES SCHEMA1.REF_TAB1 (CKEY) ENABLE NOVALIDATE,
>>>>>>
>>>>>> CONSTRAINT MAIN_TAB_R03 FOREIGN KEY (SCKEY) REFERENCES
>>>>>> SCHEMA1.REF_TAB2 (SCKEY) ENABLE NOVALIDATE);
>>>>>>
>>>>>> ******************
>>>>>>
>>>>>> CREATE TABLE SCHEMA1.REF_TAB1( CKEY NUMBER CONSTRAINT REF_TAB1_C01
>>>>>> NOT NULL );
>>>>>>
>>>>>> CREATE UNIQUE INDEX SCHEMA1.REF_TAB1_PK ON SCHEMA1.REF_TAB1(CKEY);
>>>>>>
>>>>>> ALTER TABLE SCHEMA1.REF_TAB1 ADD ( CONSTRAINT REF_TAB1_PK PRIMARY
>>>>>> KEY (CKEY) USING INDEX SCHEMA1.REF_TAB1_PK ENABLE VALIDATE);
>>>>>>
>>>>>> *******************
>>>>>>
>>>>>> CREATE TABLE SCHEMA1.REF_TAB2( SCKEY NUMBER CONSTRAINT REF_TAB2_C01
>>>>>> NOT NULL );
>>>>>>
>>>>>> CREATE UNIQUE INDEX SCHEMA1.REF_TAB2_PK ON SCHEMA1.REF_TAB2(SCKEY);
>>>>>>
>>>>>> ALTER TABLE SCHEMA1.REF_TAB2 ADD ( CONSTRAINT REF_TAB2_PK PRIMARY
>>>>>> KEY (SCKEY) USING INDEX SCHEMA1.REF_TAB2_PK ENABLE VALIDATE);
>>>>>>
>>>>>> ******************
>>>>>>
>>>>>>
>>>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue May 03 2022 - 06:41:34 CEST

Original text of this message