RE: log buffer size and log file syncs
Date: Tue, 29 May 2012 16:14:41 +0000
Mark, you raise some good points. REDO is mirrored at the logical level, to address possible logical corruption (which we have run into before on other systems). The disk devices are on very large SAN storage with 1 TB of cache memory at the SAN level. So we are doing 2 x the normal redo activity plus Data Guard. I had a Oracle RAC performance specialist look at our Service Request and he was of the opinion that setting LGWR to a real-time priority would be a good thing to do, and would not do any harm in our environment- which agrees with your assessment. 99% of our redo disk service time is <= 2ms, but we have some outliers that spike up to > 8ms which seems to have a cascading affect on performance. These outliers seem to be caused by anomalies in the multipathing software we are using, so some changes are pending to remove these outliers.
From: Mark W. Farnham [mailto:mwf_at_rsiz.com] Sent: Tuesday, May 29, 2012 11:51 AM
To: tanel_at_tanelpoder.com; CRISLER, JON A Cc: 'oracle-l'
Subject: RE: log buffer size and log file syncs
" However, I don't like to fix a problem first and then see whether the problem existed in first place (trial and error), that's why I asked for extra information / hard evidence in form of LGWR's snapper output ..."
In most cases I tend to agree with Tanel's call to only take specific actions in reaction to specific known problems. (Doubly so, since by calling for running Oracle "memory rich" in 1990 I may have contributed to the helter skelter bchr landrush. In my defense, I called for running "memory rich" on a system where at the time my total SGA was 10 megabytes and I was not even calculating bchr, but rather, I had so little memory that lookup tables that were nearly never updated were being chronically re-read from disk [OK, Unix file buffer probably at least some of the time].)
In the case of setting LGWR to a "stays scheduled more often" priority, where it is convenient to set it (which depends on what release you're running and the OS), I'm unaware of ways this can cause harm. That being the case, setting it *may* not solve your current problem, but it is unlikely to *cause* a problem and be a prophylaxis against future transient problems. So I consider it a useful standard configuration unless it is contraindicated.
By the way, how many different ways do you have the online logs mirrored, and is the mirroring configuration forcing multiple writes to the same devices? If you're overrunning write caches (or don't have write caches on those devices) writing multiple times to the same device can force otherwise unneeded seeks and queueing for no extra physical error prevention. (And while I'm always up for selecting one of physical multiplexing and having multiple members of a given group, I know that many folks disagree and have been saved by the extra member in the case of operator error.) I'd suggest that if there is indeed a multi-write problem with the log file syncs and you *cannot* do less work, then removing some copies or at least getting them to different devices in a non-conflicting pattern with ARCH is called for, whether or not it actually cures your current problem.
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Tanel Poder
Sent: Tuesday, May 01, 2012 11:54 AM
To: CRISLER, JON A
Subject: Re: log buffer size and log file syncs
Increasing LGWR priority would only help if it was currently starving for CPU / or waiting too long in the CPU runqueue... Unfortunately on Linux there's no easy way to measure this directly. If your load is low (let's say only 10 on a 32 CPU machine) then I'd expect that LGWR priority change isn't going to help much.
However, I don't like to fix a problem first and then see whether the problem existed in first place (trial and error), that's why I asked for extra information / hard evidence in form of LGWR's snapper output ...
On Tue, May 1, 2012 at 6:19 PM, CRISLER, JON A <JC1706_at_att.com> wrote:
> Red Hat Linux 5. We have async DG running but Real Time apply is
> also configured, and redo logs are mirrored. I believe LGWR is not
> starved for CPU given the overall conditions for the system, but I am
> finding some info that putting lgwr in a real-time OS priority would
> be a good thing.****
> ** **
> The default for _*high_priority*_processes is LMS*|VKTM but I have
> seen some Metalink notes about adding LGWR. I also saw a blog post
> that mentioned you discussed setting this parameter at a HOTSOS
> seminar, and this is something we are considering. Given all the CPU
> power in this server, and all the LMS processes, I don't this would
> pose a problem.****
> ** **
> alter system set "_high_priority_processes"='LMS*|VKTM|LGWR'
> ** **
> ** **
> *From:* tanel_at_poderc.com [mailto:tanel_at_poderc.com] *On Behalf Of
> *Tanel Poder
> *Sent:* Monday, April 30, 2012 6:21 PM
> *To:* CRISLER, JON A
> *Cc:* oracle-l
> *Subject:* Re: log buffer size and log file syncs****
> ** **
> Which OS are you on? If it happens to be Solaris, then prstat -mLp
> *PID*would show the scheduling latency for LGWR. This would help to
> find out whether LGWR is CPU starved or not.... what load averages do
> you have?****
> ** **
> Also, what does snapper say when ran on LGWR? If you have synchronous
> DG for example, then LGWR would wait for the LNS ack too in addition
> to the log file parallel write wait, before returning OK back to the
> committing session ...****
> ** **
> On Mon, Apr 30, 2012 at 5:56 PM, CRISLER, JON A <JC1706_at_att.com>
> Interesting thoughts Tanel: in this case of this specific app, the
> majority of the work is made of up small commits to a handful of
> tables on a 6 node RAC cluster. I/O times are generally quite good,
> and with 32 cores per node the CPU and load average is very low. Its
> 11gR1 - I was wondering if some of the tweaks to put LGWR at "real
> time" priority that are mentioned for 10g also apply to 11g.****
> ** **