Re: LGWR, EMC or app cursors?
Date: Tue, 8 Oct 2019 09:34:19 +0200
Message-ID: <CALH8A93ZkcArJU_CcJ0gxb3heeBJAwMXA=eafkBFhDNGjnt79g_at_mail.gmail.com>
Hi Dave,
as you asked for tracing, a "normal" 10046 trace can be enabled for
logwriter
<https://fritshoogland.files.wordpress.com/2014/04/profiling-the-logwriter-and-database-writer.pdf>
.
You will not get SQL statements, but normal trace information regarding
WAITs.
The event log file parallel write is somehow tricky. Frits wrote a nice blog
post
<https://fritshoogland.wordpress.com/2013/08/30/oracle-io-on-linux-log-writer-io-and-wait-events/>
about it.
It's important to understand that it represents multiple IOs (that's the
parallel).
> "EMC and sysadmins have confirmed there are no disk errors and from their
standpoint the disks are waiting on Oracle."
I assume you have a (or two) FiberChannel SAN which connects EMS and your
DB-host. Please ask them for measurements on those switches also.
The argument is simple: If the host claims it waits on the disks (according
to iostat) and EMC claims it's waiting on Oracle, have a closer look at the
components in between.
hth,
Martin
Am Mo., 7. Okt. 2019 um 17:20 Uhr schrieb Herring, David < dmarc-noreply_at_freelists.org>:
> Folks, I've got a bit of a mystery with a particular db where we're
> getting a periodic 25-30 pause between user sessions and LGWR processes and
> can't clearly identify what's the cause.
>
>
>
> - The database is 11.2.0.4, RHEL 7.5, running ASM on EMC.
> - Sometimes once a day, sometimes more (never more than 5) times a day
> we see user processes start waiting on "log file sync". LGWR is waiting on
> "log file parallel write".
> - At the same time one of the emcpower* devices shows 100% busy and
> service time 200+ (from iostat via osw). mpstat shows 1 CPU at 100% on
> iowait. It's not always the same disk (emcpowere1, a1, h1, …), not always
> the same CPU. EMC and sysadmins have confirmed there are no disk errors
> and from their standpoint the disks are waiting on Oracle.
> - During this time LGWR stats in ASH are all 0 - TIME_WAITED, DELTA*
> columns. Only after the problem goes away (about 25 secs) these columns
> are populated again, obviously the DELTA* columns 1 row later. LGWR's
> session state is WAITING so I assume the column value observations are due
> to LGWR waiting, as it won't write stats until it can do something.
>
>
>
> I am stuck trying to find out, really prove who is the culprit or what
> exactly the wait is on. Is LGWR waiting on user sessions and user sessions
> are waiting on LGWR and all that causes the disk to be 100%? Can I enable
> some sort of tracing on LGWR and would that point to exactly what he's
> waiting on to prove where the problem is?
>
>
>
> Regards,
>
>
>
> Dave
>
-- http://www.freelists.org/webpage/oracle-lReceived on Tue Oct 08 2019 - 09:34:19 CEST