Re: ADG lag after upgrading to 12.1

From: Rich J <rjoralist3_at_society.servebeer.com>
Date: Tue, 19 Mar 2019 15:26:42 -0500
Message-ID: <c91cbc684ed36268b241a678808341b9_at_society.servebeer.com>



On 2019/03/19 05:56, Neil Chandler wrote:

> So nothing obvious there, unless you happen to be running on SMT4-level cpu's. Oracle is fine when its on SMT2 (cpu 0 and 1) as each thread has its own L2 cache, but if it scales to use SMT4 the AIX server can struggle to be effective as the L2 is shared between threads, so the L2 is effectively halved and you get an increase in L2 cache misses. I'd compare the lag at 20% used and 60% used to see it is really 60% used or effectively running out. Can you see the AIX metrics to see if the threads are running on SMT4 a lot - the 3rd and 4th CPU's per processor (e.g. use mpstat -s to see the Processor->SMT cpu relationship, and something like mpstat -d 15 to see which are active)

There isn't much there. Yes, AIX does have weird CPU accounting with SMT (thanks to Jeremy Schneider's post at https://ardentperf.com/2016/07/01/understanding-cpu-on-aix-power-smt-systems/), but none of that has changed. The only thing that's changed is the upgrade of Oracle DBs, the Listener, and associated utilities (ADG, RMAN, etc). I'm beginning to wonder if I'm hitting another fun change in ADG, like the thread issue you had blogged about.

> I assume you're pulling the lag information from V$DATAGUARD_STATS ?
> As you are using Active DataGuard, V$STANDBY_EVENT_HISTOGRAM should be populated with useful lag information.

Since I do have a license for the "Active" part of Active Data Guard, I was actually pulling from V$STANDBY_EVENT_HISTOGRAM. The transient nature of V$DATAGUARD_STATS always seemed much less useful to me.

> It might be worth checking the settings in LOG_ARCHIVE_DEST_n, and see if you are using AFFIRM or NOAFFIRM.
> Switching to NOAFFIRM would confirm if the lag was cause by waiting for the write to SRL to ack (and to confirm there's no parameters in there like DELAY=nnn ) although that may well be classed as transport lag.

Good call, as DBUA did recreate my spfile, kindly "fixing" many of my parameter values. I rechecked in V$ARCHIVE_DEST, even though I'm running ASYNC where the default is "NOAFFIRM" and yes, the AFFIRM column is indeed "NO".

Thanks,
Rich

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Mar 19 2019 - 21:26:42 CET

Original text of this message