Re: reads by KTSJ

From: Tanel Poder <tanel_at_tanelpoder.com>
Date: Wed, 13 May 2020 17:43:00 -0400
Message-ID: <CAMHX9JJmBZbcujsMa-k-vpBFtyK8a_Tfem5yNwozgFeF+w0wZw_at_mail.gmail.com>



If it's lots of single block reads, then the free buffer waits may just show up as a result of high concurrency of reads + block changes (due to whatever perceived need to "fix" the segment blocks). The DBWR wasn't able to keep up with syncing the modified blocks to disk fast enough. The Space management slaves may have inadvertently executed a SLOB-like I/O benchmark workload in your machine :-)

Until you find out if it's segment repair-related and why it kicked in, you could greatly reduce the *_max_spacebg_slaves* parameter value. It's semi-documented in a couple of MOS notes - in my test lab machine the max number is 1024 (and I have 62 Wnnn processes just waiting around right now):

SQL> _at_pd _max_spacebg_slaves
Show all parameters and session values from x$ksppi/x$ksppcv...

       NUM NAME                                                     VALUE
---------- -------------------------------------------------------- -----
      2604 *_max_spacebg_slaves*                                      1024
      2605 _minmax_spacebg_slaves                                   8


That way, even if a repair kicks in, it won't get more than say 8 workers doing all this I/O concurrently.

--
Tanel
http://tanelpoder.com


On Wed, May 13, 2020 at 5:34 PM Tanel Poder <tanel_at_tanelpoder.com> wrote:


> Ok I managed to forget which events you saw as I read through the thread
> :-)
>
> There are also V$SYSSTAT/V$SESSTAT metrics related to ASSM segment fixing:
>
> SQL> _at_sys ASSM%fix
>
> NAME
> VALUE
> ----------------------------------------------------------------
> --------------------------
> ASSM bg: segment fix monitor
> 1845
> ASSM fg: submit segment fix task
> 0
> ASSM bg:mark segment for fix
> 0
> ASSM bg:create segment fix task
> 0
> ASSM bg:slave fix one segment
> 0
> ASSM bg:slave fix state
> 0
>
>
> You could look into AWR and check if you have any unusual numbers there -
> or run Snapper on the Wnnn processes when the problem happens again.
>
> Also, assuming that the Wnnn slaves haven't exited, you can just check the
> current V$SESSTAT values (_at_ses2 shows any matching metrics with non-zero
> values):
>
> SQL> _at_ses2 "select sid from v$session where program like '%(W%'" ASSM%fix
>
> SID NAME
> VALUE
> ----------
> ---------------------------------------------------------------- ----------
> 124 ASSM bg: segment fix monitor
> 40
> 851 ASSM bg: segment fix monitor
> 41
> 1213 ASSM bg: segment fix monitor
> 41
> 1578 ASSM bg: segment fix monitor
> 42
> ...
>
>
> Tanel
> https://tanelpoder.com
>
>>
>>
-- http://www.freelists.org/webpage/oracle-l
Received on Wed May 13 2020 - 23:43:00 CEST

Original text of this message