Re: Undo Tablespace issue
Date: Thu, 6 Jul 2023 10:34:49 +0300
Message-ID: <CACGsLCJ-2+zMCZP9R2D1mZBg2oc5-4NPQEVz9hoHyCnsW7nw7Q_at_mail.gmail.com>
On Thu, Jul 6, 2023 at 8:24 AM yudhi s <learnerdatabase99_at_gmail.com> wrote:
> I checked the dba_undotablespaces , Retention is set as NOGUARANTEE for
> all the four undo tablespaces for all of those four instances. Also when it
> had errored out , I was checking the v$session to see if any transaction
> having logon_time was very old and is responsible for holding up the
> Unexpired UNDO but didn't find any such session during that time. But yes
> that time i joined the v$transaction and v$session with equi join
> condition. Note- Undo retention is set as 10800 i.e. 3hrs.
>
> As you mentioned, I tried to see if any sessions in v$session which are
> not in v$transactions(using below query) but are having very old
> logon_time(more than ~4 days old). I see below entries,
>
> select *--a.logon_time, a.sid, a.username,a.machine, a.program,
> a.status,a.sql_id
> from gv$session a
> where not exists (select 1 from gv$transaction b where b.addr= a.taddr
> and a.inst_id= b.inst_id )
> and a.inst_id=1
> -- and schemaname not in ('SYS','DBSNMP')
> and logon_time<sysdate-4
> order by logon_time asc
>
> I see ~120 sessions are from SYS and those are background
> processes(something as below).
>
> oracle_at_x073db05.salem.paymentech.com (PSP0)
> oracle_at_x073db05.salem.paymentech.com (CLMN)
> oracle_at_x073db05.salem.paymentech.com (PMON)
> oracle_at_x073db05.salem.paymentech.com (IPC0)
> oracle_at_x073db05.salem.paymentech.com (PMAN)
> oracle_at_x073db05.salem.paymentech.com (LMS1)
> oracle_at_x073db05.salem.paymentech.com (MMAN)
> oracle_at_x073db05.salem.paymentech.com (LMS0)
> oracle_at_x073db05.salem.paymentech.com (GEN1)
> oracle_at_x073db05.salem.paymentech.com (SCMN)
> oracle_at_x073db05.salem.paymentech.com (RMS0)
> oracle_at_x073db05.salem.paymentech.com (LMD0)
>
> 30 are from schema DBSNMP, the program is "OMS".
>
> Only ~10 sessions I see are from application users which are showing
> INACTIVE status in V$session and the PREV_SQL_ID is pointing to below sql.
>
> update user$ set spare6=DECODE(to_char(:2, 'YYYY-MM-DD'), '0000-00-00',
> to_date(NULL), :2) where user#=:1
>
> Do you think any of the above shows any oddity which can be the cause of
> the Ora-30036?
>
> Regards
> Yudhi
>
>
>
> On Thu, Jul 6, 2023 at 2:00 AM Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>> You've previously reported "unexpired extents" in this instance's undo
>> tablespace as 1.1TB. Oracle is capable of "stealing" unexpired extents
>> unless the undo tablespace has been declared with the "retention guarantee"
>> clause. Can you check whether or not this is the case for this instance. If
>> you had an old uncommitted transaction - or other event - that had
>> allocated a slot in the transaction table header of one of your undo
>> segments then the extents to the future of that transaction start SCN would
>> be ACTIVE, not unexpired, so you need to find out you have so much
>> unexpired undo.
>>
>> This may require some unusual poking around, but before you do anything
>> else you could query v$session and v$transaction to see if v$session
>> reports any transactions that are (persistnently) not visible in
>> v$transaction (outer join v$session.taddr with v$transaction.addr).
>>
>> Regards
>> Jonathan Lewis
>>
>>
>> On Wed, 5 Jul 2023 at 20:27, yudhi s <learnerdatabase99_at_gmail.com> wrote:
>>
>>> Thank You Timur and Paul for this feedback.
>>>
>>> Below is the output from the blog which Timur pointed to. And the job
>>> finished after we kept increasing the UNDO and it's now at ~2TB. And the
>>> job ran for 20hrs+. I ran the query by passing awr_snapshot_count as ~50
>>> and then ~100 and in both cases I got the similar output as below. But yes
>>> as in our case we were not hitting Ora-01555 but Ora-30036 which is
>>> different , so how can we interpret this result and find the culprit
>>> query/session for Ora-30036?
>>>
>>> As mentioned the v$transaction was only showing ~300GB of UNDO used by
>>> that specific session.
>>>
>>> The job which was failing multiple times on UNDO was running on Node-1
>>> on this four node database.
>>>
>>> INST_ID CURRENT_SIZE_MB IS_AUTOEXTENSIBLE UNDO_RETENTION
>>> UNDO_SIZE_MIN_MB UNDO_SIZE_GUARANTEE_MB LONGEST_SQL LONGEST_SQL_ID
>>> MAX_ORA1555_CNT MAX_NO_SPACE_CNT
>>> 1 2356086 YES 3 991027.8125 5836862 42.48055556 8yp4vr3aqhnxa 3 0
>>> 2 798968 YES 3 39969.25 6178679 50.92833333 8pnuydbnxhbhj 3 0
>>> 3 322301 YES 3 120706.5625 424632 206.5272222 0uypm2w7jxtvk 2 0
>>> 4 445988 YES 3 130544.75 1729500 72.07694444 ds172hnn0044p 10 0
>>>
>>> On Wed, Jul 5, 2023 at 3:33 PM Timur Akhmadeev <
>>> timur.akhmadeev_at_gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Have you checked MOS Doc 460481.1
>>>> Also can you run a script from this post
>>>> https://timurakhmadeev.wordpress.com/2018/02/05/undo-sizing/ and share
>>>> its output?
>>>> It's targeted more for handling ORA-1555 errors yet still can be
>>>> useful. The input should be a number of AWR snapshots covering target
>>>> undo_retention.
>>>>
>>>> On Tue, Jul 4, 2023 at 5:59 PM yudhi s <learnerdatabase99_at_gmail.com>
>>>> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> Its Oracle version 19C(19.15) and using AUTO undo management. We have
>>>>> a job keep failing suddenly with UNDO error as below, after running for
>>>>> 6-7hours. It used to run fine in past months. But this time its failing
>>>>> multiple times even we rerun. I understand its a big transaction without
>>>>> commit in between. However checking (used_ublk) from gv$transaction showing
>>>>> the size its consuming is ~200GB only , however the full size of UNDO
>>>>> tablespace is ~1.5TB. And then checking the details out of
>>>>> DBA_UNDO_EXTENTS, we see majority of the blocks in the UNDO in "UNEXPIRED"
>>>>> status as below.
>>>>>
>>>>> Does it mean that it may be that one odd transaction/session (or it
>>>>> may be a SELECT query) is holding all the UNDO and not letting it to mark
>>>>> as EXPIRED as its active? But then when doing a select on v$session for
>>>>> that exact node to which this UNDO tablespace is aligned and doing
>>>>> "logon_time desc" won't show any such long running sessions? Also checked
>>>>> GV$UNDOSTAT order by tuned_undoretention desc, but not seeing any such
>>>>> session standing out apart from the currently running one.
>>>>>
>>>>> Any other possible way to get hold of the culprit session/user which
>>>>> we can kill to get back the "Unexpired UNDO blocks" back to the "Expired"
>>>>> bucket such that, that can be usable?
>>>>>
>>>>> ORA-30036: unable to extend segment by 128 in undo tablespace
>>>>> 'UNDOTBS1'
>>>>>
>>>>> SELECT DISTINCT STATUS, SUM(BYTES)/1024/1024/1024, COUNT(*) FROM
>>>>> DBA_UNDO_EXTENTS where tablespace_name='UNDOTBS1' GROUP BY STATUS;
>>>>>
>>>>> STATUS SUM(BYTES)/1024/1024/1024 COUNT(*)
>>>>> UNEXPIRED 1130.301331 150047
>>>>> EXPIRED 0.1328125 3
>>>>> ACTIVE 594.6703491 44210
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> Yudhi
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Timur Akhmadeev
>>>>
>>>
-- Regards Timur Akhmadeev -- http://www.freelists.org/webpage/oracle-lReceived on Thu Jul 06 2023 - 09:34:49 CEST