Re: blocking enq-ss wait events

From: Pap <oracle.developer35_at_gmail.com>
Date: Fri, 7 May 2021 10:34:15 +0530
Message-ID: <CAEjw_fgBO1zO=FrPRXiPuxOd_d60Aig8rkdE4VOHE1VY8aj+NQ_at_mail.gmail.com>



The doc which chris has pointed says the SMON can cause such sort segment contention but it doesn't say anything about big rollback. But as you said you were seeing a lot of "wait for a undo record" post DB reboot i.e. most probably the same rollback was getting resumed (which was running before reboot) and this rollback process is also managed by SMON only , so i would say the sort segment contention was somehow related to the Big transaction Rollback only. Though I can't explain it logically at the moment. Others may throw some light.

Thanks and Regards
Pap

On Fri, May 7, 2021 at 8:51 AM Lok P <loknath.73_at_gmail.com> wrote:

> Thank You Chris.
>
> Both of them are tempfiles. I see the CONTENTS column in dba_tablespaces
> of both tablespaces are TEMPORARY. But i am wondering how this can be
> related.
>
> Also is that big rollback which was happening before DB reboot and resumed
> even after DB reboot is anyway having relation with the sort segment
> contention which we saw post reboot? Or the sort segment contention after
> reboot is independent of the rollback which was still in play by the SMON
> post reboot?
>
> Regards
> Lok
>
> On Fri, May 7, 2021 at 2:27 AM Chris Taylor <
> christopherdtaylor1994_at_gmail.com> wrote:
>
>> It's interesting that you have different TEMP space assigned to SYS and
>> application users. Are both TEMP spaces actually TEMPFILES or is one
>> dictionary managed?
>>
>> (DBA_TEMP_FILES vs DBA_DATA_FILES)
>>
>> Also there used to be a scenario where FAST_START_PARALLEL_ROLLBACK
>> parameter would negatively impact performance but in 11.2, I would expect
>> that to not be an issue unless you're CPU bound where CPUs are pegged (100%
>> busy or near 100% busy).
>>
>>
>>
>> Chris
>>
>>
>> On Thu, May 6, 2021 at 2:52 PM Lok P <loknath.73_at_gmail.com> wrote:
>>
>>> I see we have different temp space aligned to SYS and different for
>>> application users. So why would SMON(which is a SYS process) create the SS
>>> or SORT SEGMENT contention on a tempspace i.e. aligned to application
>>> users/queries and that too after a DB reboot?
>>>
>>> On Thu, May 6, 2021 at 11:44 PM Lok P <loknath.73_at_gmail.com> wrote:
>>>
>>>> Thanks a lot. So here is what happened and I am still struggling to
>>>> clearly understand how logically these are related. We had killed one long
>>>> running transaction as it was reading from UNDO from a long time(~8+hrs),
>>>> that resulted in a rollback and it was keep going and we use to see wait
>>>> event "wait for a undo record" from multiple SYS sessions (which most
>>>> probably SMON doing the cleanup in multiple parallel slaves). But till that
>>>> time we were all okay because other application queries/sessions and
>>>> everything was going fine and that was not blocking anyone.
>>>>
>>>> Then when above was going on the infra team did another planned
>>>> activity in which the database had to be rebooted. They did that and
>>>> brought the database back online, and after this we started seeing the same
>>>> "wait for undo record" wait event and thought it may be that SMON is
>>>> resuming its rollback/cleanup and it should not impact other application
>>>> queries. But then suddenly it appears "enq:ss contention" for multiple
>>>> application sessions and the blocking session was waiting on "sort segment
>>>> request" and those application queries were just stuck.
>>>>
>>>> I am trying to understand why even the database shutdown happened and
>>>> it started up seamlessly but the old rollback/cleanup reinitiated again by
>>>> the SMON. And even then, why after the DB reboot , the application session
>>>> was stuck on "sort segment request" causing other sessions to hang in
>>>> "enq-ss contention"? How is it related to the big rollback?
>>>>
>>>> Regards
>>>> Lok
>>>>
>>>>
>>>> On Thu, May 6, 2021 at 7:02 PM Chris Taylor <
>>>> christopherdtaylor1994_at_gmail.com> wrote:
>>>>
>>>>> No killing the recovery process isn't a great idea as I'm fairly
>>>>> certain SMON is involved which is a critical component of the database
>>>>> operation. Kill it, you kill the instance.
>>>>>
>>>>> Are you CPUs on the server very busy, very low idle ?
>>>>>
>>>>> Find SID & SERIAL for the SMON process and query GV$PX_SESSION for
>>>>> QCSID=<sid of SMON> and QCSERIAL# = <serial# of SMON> and see how many
>>>>> parallel server processes its using.
>>>>>
>>>>> If you kill (or have to restart the database) be aware that the
>>>>> startup of the database will "pause" while SMON cleans up that dead
>>>>> transaction before it opens the database (most likely) but SMON should do
>>>>> that in parallel and might be your best bet to get the database back to
>>>>> normal operating procedure. Though it might take a while for SMON to
>>>>> finish and thus delay the opening of the database which you can monitor by
>>>>> tailing the alert log file if you do kill & restart the database instance.
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 6, 2021 at 1:14 AM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>
>>>>>> Thank you . It seems to match our symptoms. (Big Rollback causing
>>>>>> enq-ss contention)
>>>>>>
>>>>>> We were thinking about killing the system processes which are trying
>>>>>> to perform the rollback as we don't need that job anymore. Is that safe?
>>>>>> Or we have to either wait to let that finish and the new transactions
>>>>>> may move on by increasing the pga_aggregate_target as suggested in the
>>>>>> note. else drop and create temp tablespace.
>>>>>>
>>>>>> On Thu, May 6, 2021 at 10:13 AM Chris Taylor <
>>>>>> christopherdtaylor1994_at_gmail.com> wrote:
>>>>>>
>>>>>>> I had to look up ENQ-SS on Oracle support. That's a sort segment
>>>>>>> contention, usually encountered when SMON is really busy cleaning up a dead
>>>>>>> / killed transaction.
>>>>>>>
>>>>>>> Oracle support has a few notes on this but the most applicable seem
>>>>>>> to stop at 11.1 .
>>>>>>>
>>>>>>> There is one note that says to try to increase PGA_AGGREGATE_TARGET
>>>>>>> SS Sort Segment Enqueue: 'enq: SS - contention' (Doc ID 2601825.1)
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>> On Wed, May 5, 2021 at 11:56 PM Lok P <loknath.73_at_gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi All, Need some help, we had killed one of the long running
>>>>>>>> sessions which was running since ~8hrs+ , but after we kill that we see a
>>>>>>>> lot of "wait for a undo record" wait events but we ignored that thinking
>>>>>>>> that will run for sometime as because it will do rollback. But suddenly now
>>>>>>>> we are seeing "enq-SS contention" wait event in addition to "wait for a
>>>>>>>> undo record" and that is blocking all other application queries. So
>>>>>>>> wondering how we should mitigate this issue?
>>>>>>>>
>>>>>>>> The version is 11.2.0.4 of Exadata.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Lok
>>>>>>>>
>>>>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri May 07 2021 - 07:04:15 CEST

Original text of this message