Re: LGWR, EMC or app cursors?
Date: Mon, 7 Oct 2019 23:19:53 +0100
Message-ID: <CACj1VR55pYYmWY1FXu2RfvH5O+y_8Cu6Q=MKUix_Gyz4wUCKiw_at_mail.gmail.com>
“
- Sometimes once a day, sometimes more (never more than 5) times a day we see user processes start waiting on "log file sync". LGWR is waiting on "log file parallel write".
“
Is that 25 seconds or so all the same individual wait? IE if you did
Select Sample_time, event, seq#, wait_time, time_waited from
v$active_session_history where program like '%LGWR%' and sample_time
between <start and end>
Order by sample_time
Do you see the same seq# reported and the final time waited at the end?
We’ve actually been seeing this with our EMC array, we got EMC to take a look and conveniently the problem was solved.
Hope this helps,
Andy
On Mon, 7 Oct 2019 at 21:42, Harel Safra <harel.safra_at_gmail.com> wrote:
> Hi David,
> Might also want to check if these time coincide with snapshots being taken
> on your drives. Snapshots at some storage systems may sometimes freeze IO
> to get to a synchronized state.
>
> Harel
>
> On Mon, Oct 7, 2019 at 7:17 PM Herring, David <dmarc-noreply_at_freelists.org>
> wrote:
>
>> Chris, yeah, seems that way. I had a problem a while back where
>> operations would timeout at seemingly random times and found out
>> $LD_LIBRARY_PATH included a NAS that no longer existed. "strace" showed
>> that valid paths were picked for a while, then the invalid NAS would get
>> hit and there'd be a pause.
>>
>>
>>
>> Anyway, the EMC config is all multipath. I'll check with
>> storage/sysadmin teams (pretty much everything is outsourced) to review it
>> all again.
>>
>>
>>
>> Regards,
>>
>>
>>
>> Dave
>>
>>
>>
>> [image: cid:image001.png_at_01D05044.5C2AEE60]
>>
>>
>>
>> *Dave Herring*
>>
>> DBA
>>
>> 103 JFK Parkway
>> <https://www.google.com/maps/search/103+JFK+Parkway+%0D%0A+Short+Hills,+New+Jersey%C2%A0+07078?entry=gmail&source=g>
>>
>> <https://www.google.com/maps/search/103+JFK+Parkway+%0D%0A+Short+Hills,+New+Jersey%C2%A0+07078?entry=gmail&source=g>
>>
>> Short Hills, New Jersey 07078
>> <https://www.google.com/maps/search/103+JFK+Parkway+%0D%0A+Short+Hills,+New+Jersey%C2%A0+07078?entry=gmail&source=g>
>>
>> Mobile 630.441.4404
>>
>>
>>
>> *dnb.com <http://www.dnb.com/>*
>>
>>
>>
>> [image: cid:image002.png_at_01D05044.5C2AEE60]
>> <http://www.facebook.com/DunBradstreet>[image:
>> cid:image003.png_at_01D05044.5C2AEE60] <http://twitter.com/dnbus>[image:
>> cid:image004.png_at_01D05044.5C2AEE60]
>> <http://www.linkedin.com/company/dun-&-bradstreet>[image:
>> cid:image005.png_at_01D05044.5C2AEE60]
>> <http://www.youtube.com/user/DunandBrad>
>>
>>
>>
>> *From:* oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> *On
>> Behalf Of *Chris Taylor
>> *Sent:* Monday, October 7, 2019 10:27 AM
>> *To:* dmarc-noreply_at_freelists.org
>> *Cc:* oracle-l_at_freelists.org
>> *Subject:* Re: LGWR, EMC or app cursors?
>>
>>
>>
>> *CAUTION:* This email originated from outside of D&B. Please do not
>> click links or open attachments unless you recognize the sender and know
>> the content is safe.
>>
>>
>>
>> Always the same database/machine?
>>
>>
>>
>> Almost sounds like a path is down/unavailable from the machine to the
>> storage but the OS doesn't know it isn't responding. I'm not as
>> familiar with EMC Power and I think EMC Power uses something besides
>> mutlipath drivers (but I might be mistaken).
>>
>> If you're not getting any path errors, it might be worthwhile to have
>> someone go into the cage and replace all the Fiber cables connecting from
>> this server to the storage (they can test the cables I believe and see if
>> one is faulty/noisy).
>>
>>
>>
>> Thanks,
>> Chris
>>
>>
>>
>>
>>
>> On Mon, Oct 7, 2019 at 10:20 AM Herring, David <
>> dmarc-noreply_at_freelists.org> wrote:
>>
>> Folks, I've got a bit of a mystery with a particular db where we're
>> getting a periodic 25-30 pause between user sessions and LGWR processes and
>> can't clearly identify what's the cause.
>>
>>
>>
>> - The database is 11.2.0.4, RHEL 7.5, running ASM on EMC.
>> - Sometimes once a day, sometimes more (never more than 5) times a
>> day we see user processes start waiting on "log file sync". LGWR is
>> waiting on "log file parallel write".
>> - At the same time one of the emcpower* devices shows 100% busy and
>> service time 200+ (from iostat via osw). mpstat shows 1 CPU at 100% on
>> iowait. It's not always the same disk (emcpowere1, a1, h1, …), not always
>> the same CPU. EMC and sysadmins have confirmed there are no disk errors
>> and from their standpoint the disks are waiting on Oracle.
>> - During this time LGWR stats in ASH are all 0 - TIME_WAITED, DELTA*
>> columns. Only after the problem goes away (about 25 secs) these columns
>> are populated again, obviously the DELTA* columns 1 row later. LGWR's
>> session state is WAITING so I assume the column value observations are due
>> to LGWR waiting, as it won't write stats until it can do something.
>>
>>
>>
>> I am stuck trying to find out, really prove who is the culprit or what
>> exactly the wait is on. Is LGWR waiting on user sessions and user sessions
>> are waiting on LGWR and all that causes the disk to be 100%? Can I enable
>> some sort of tracing on LGWR and would that point to exactly what he's
>> waiting on to prove where the problem is?
>>
>>
>>
>> Regards,
>>
>>
>>
>> Dave
>>
>>
-- http://www.freelists.org/webpage/oracle-lReceived on Tue Oct 08 2019 - 00:19:53 CEST
- image/png attachment: image002.png
- image/png attachment: image004.png
- image/png attachment: image001.png
- image/png attachment: image005.png
- image/png attachment: image003.png