Re: LGWR, EMC or app cursors?

From: Martin Berger <martin.a.berger_at_gmail.com>
Date: Tue, 8 Oct 2019 09:34:19 +0200
Message-ID: <CALH8A93ZkcArJU_CcJ0gxb3heeBJAwMXA=eafkBFhDNGjnt79g_at_mail.gmail.com>



Hi Dave,

as you asked for tracing, a "normal" 10046 trace can be enabled for logwriter
<https://fritshoogland.files.wordpress.com/2014/04/profiling-the-logwriter-and-database-writer.pdf> .
You will not get SQL statements, but normal trace information regarding WAITs.

The event log file parallel write is somehow tricky. Frits wrote a nice blog post
<https://fritshoogland.wordpress.com/2013/08/30/oracle-io-on-linux-log-writer-io-and-wait-events/> about it.
It's important to understand that it represents multiple IOs (that's the parallel).

> "EMC and sysadmins have confirmed there are no disk errors and from their
standpoint the disks are waiting on Oracle." I assume you have a (or two) FiberChannel SAN which connects EMS and your DB-host. Please ask them for measurements on those switches also. The argument is simple: If the host claims it waits on the disks (according to iostat) and EMC claims it's waiting on Oracle, have a closer look at the components in between.

hth,
 Martin

Am Mo., 7. Okt. 2019 um 17:20 Uhr schrieb Herring, David < dmarc-noreply_at_freelists.org>:

> Folks, I've got a bit of a mystery with a particular db where we're
> getting a periodic 25-30 pause between user sessions and LGWR processes and
> can't clearly identify what's the cause.
>
>
>
> - The database is 11.2.0.4, RHEL 7.5, running ASM on EMC.
> - Sometimes once a day, sometimes more (never more than 5) times a day
> we see user processes start waiting on "log file sync". LGWR is waiting on
> "log file parallel write".
> - At the same time one of the emcpower* devices shows 100% busy and
> service time 200+ (from iostat via osw). mpstat shows 1 CPU at 100% on
> iowait. It's not always the same disk (emcpowere1, a1, h1, …), not always
> the same CPU. EMC and sysadmins have confirmed there are no disk errors
> and from their standpoint the disks are waiting on Oracle.
> - During this time LGWR stats in ASH are all 0 - TIME_WAITED, DELTA*
> columns. Only after the problem goes away (about 25 secs) these columns
> are populated again, obviously the DELTA* columns 1 row later. LGWR's
> session state is WAITING so I assume the column value observations are due
> to LGWR waiting, as it won't write stats until it can do something.
>
>
>
> I am stuck trying to find out, really prove who is the culprit or what
> exactly the wait is on. Is LGWR waiting on user sessions and user sessions
> are waiting on LGWR and all that causes the disk to be 100%? Can I enable
> some sort of tracing on LGWR and would that point to exactly what he's
> waiting on to prove where the problem is?
>
>
>
> Regards,
>
>
>
> Dave
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Oct 08 2019 - 09:34:19 CEST

Original text of this message