Re: LGWR, EMC or app cursors?

From: Harel Safra <harel.safra_at_gmail.com>
Date: Mon, 7 Oct 2019 23:41:56 +0300
Message-ID: <CA+UC=5FLLCdac_v-QXSe4BQT1r5F2aYEgdTT22tVyC9jNzpusw_at_mail.gmail.com>





Hi David,
Might also want to check if these time coincide with snapshots being taken on your drives. Snapshots at some storage systems may sometimes freeze IO to get to a synchronized state.

Harel

On Mon, Oct 7, 2019 at 7:17 PM Herring, David <dmarc-noreply_at_freelists.org> wrote:

> Chris, yeah, seems that way. I had a problem a while back where
> operations would timeout at seemingly random times and found out
> $LD_LIBRARY_PATH included a NAS that no longer existed. "strace" showed
> that valid paths were picked for a while, then the invalid NAS would get
> hit and there'd be a pause.
>
>
>
> Anyway, the EMC config is all multipath. I'll check with storage/sysadmin
> teams (pretty much everything is outsourced) to review it all again.
>
>
>
> Regards,
>
>
>
> Dave
>
>
>
> [image: cid:image001.png_at_01D05044.5C2AEE60]
>
>
>
> *Dave Herring*
>
> DBA
>
> 103 JFK Parkway
>
> Short Hills, New Jersey 07078
>
> Mobile 630.441.4404
>
>
>
> *dnb.com <http://www.dnb.com/>*
>
>
>
> [image: cid:image002.png_at_01D05044.5C2AEE60]
> <http://www.facebook.com/DunBradstreet>[image:
> cid:image003.png_at_01D05044.5C2AEE60] <http://twitter.com/dnbus>[image:
> cid:image004.png_at_01D05044.5C2AEE60]
> <http://www.linkedin.com/company/dun-&-bradstreet>[image:
> cid:image005.png_at_01D05044.5C2AEE60]
> <http://www.youtube.com/user/DunandBrad>
>
>
>
> *From:* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> *On
> Behalf Of *Chris Taylor
> *Sent:* Monday, October 7, 2019 10:27 AM
> *To:* dmarc-noreply_at_freelists.org
> *Cc:* oracle-l_at_freelists.org
> *Subject:* Re: LGWR, EMC or app cursors?
>
>
>
> *CAUTION:* This email originated from outside of D&B. Please do not click
> links or open attachments unless you recognize the sender and know the
> content is safe.
>
>
>
> Always the same database/machine?
>
>
>
> Almost sounds like a path is down/unavailable from the machine to the
> storage but the OS doesn't know it isn't responding. I'm not as
> familiar with EMC Power and I think EMC Power uses something besides
> mutlipath drivers (but I might be mistaken).
>
> If you're not getting any path errors, it might be worthwhile to have
> someone go into the cage and replace all the Fiber cables connecting from
> this server to the storage (they can test the cables I believe and see if
> one is faulty/noisy).
>
>
>
> Thanks,
> Chris
>
>
>
>
>
> On Mon, Oct 7, 2019 at 10:20 AM Herring, David <
> dmarc-noreply_at_freelists.org> wrote:
>
> Folks, I've got a bit of a mystery with a particular db where we're
> getting a periodic 25-30 pause between user sessions and LGWR processes and
> can't clearly identify what's the cause.
>
>
>
> - The database is 11.2.0.4, RHEL 7.5, running ASM on EMC.
> - Sometimes once a day, sometimes more (never more than 5) times a day
> we see user processes start waiting on "log file sync". LGWR is waiting on
> "log file parallel write".
> - At the same time one of the emcpower* devices shows 100% busy and
> service time 200+ (from iostat via osw). mpstat shows 1 CPU at 100% on
> iowait. It's not always the same disk (emcpowere1, a1, h1, …), not always
> the same CPU. EMC and sysadmins have confirmed there are no disk errors
> and from their standpoint the disks are waiting on Oracle.
> - During this time LGWR stats in ASH are all 0 - TIME_WAITED, DELTA*
> columns. Only after the problem goes away (about 25 secs) these columns
> are populated again, obviously the DELTA* columns 1 row later. LGWR's
> session state is WAITING so I assume the column value observations are due
> to LGWR waiting, as it won't write stats until it can do something.
>
>
>
> I am stuck trying to find out, really prove who is the culprit or what
> exactly the wait is on. Is LGWR waiting on user sessions and user sessions
> are waiting on LGWR and all that causes the disk to be 100%? Can I enable
> some sort of tracing on LGWR and would that point to exactly what he's
> waiting on to prove where the problem is?
>
>
>
> Regards,
>
>
>
> Dave
>
>











--
http://www.freelists.org/webpage/oracle-l


Received on Mon Oct 07 2019 - 22:41:56 CEST

Original text of this message