Re: Application of archived logs on physical standby very slow

From: <Laimutis.Nedzinskas_at_seb.lt>
Date: Fri, 29 Jan 2010 11:15:26 +0200
Message-ID: <OFAAC39E57.D9EF40A1-ONC22576BA.00318DE4-C22576BA.0032DE25_at_seb.lt>

I have the same issue.

vmstat shows low CPU, iostat is low but standby starts lagging by 50GB of redo when primary generates redo at some 15-20MB/second rate.

redo validation (alter system dump logfile validate...) shows 150-200 MB/sec read rate for redo.

My primary suspect is redo reader process. As far as I know it is only one process which reads and naturally it is a bottleneck.

vmstat may show next to nothing because our OS sees 64 CPUs (which is not but todays multicore, multithreaded, multi-whatever architectures translates into that many "CPU's" )

If one and the only reader takes 100% CPU then vmstat must show ~100/64 percent which is exactly what I get.

Is the reader tunable ? I doubt it....

Please consider the environment before printing this e-mail

                                                                           
             Niall Litchfield                                              
             <niall.litchfield                                             
             _at_gmail.com>                                                To 
             Sent by:                  Martin Bach                         
             oracle-l-bounce_at_f         <development_at_the-playground.de>     
             reelists.org                                               cc 
                                       ORACLE-L <oracle-l_at_freelists.org>   
                                                                   Subject 
             2010.01.28 15:00          Re: Application of archived logs on 
                                       physical standby very slow          
                                                                           
             Please respond to                                             
             niall.litchfield_at_                                             
                 gmail.com

What does the 10046 trace for the MRP process show up? You may well also endup doing os level tracing with strace or similar.

On Thu, Jan 28, 2010 at 9:15 AM, Martin Bach <development_at_the-playground.de > wrote:

Hi Niall!

      couldn't agree more. I seem to have the dilemma where my storage
      array seems to be in good working order and my OS doesn't report any
      unusually high IO service times. I'm wondering where else I could
      look for clues as to why, trying to follow a methodical approach to
      troubleshooting.

      The load is clearly caused by Oracle processes, shutting the standby
      down reduces the iowaits to 0 (but that doesn't really solve my
      problem :)

      Regards,

      Martin!


      On 01/27/2010 07:06 PM, Niall Litchfield wrote:
            I'd have thought this problem was eminently suited to method-r.

            On 1/27/10, Johnson, William L (TEIS)<
            WLJohnson_at_tycoelectronics.com> �wrote:
                  What do you have your parallel_execution_message_size set
                  to on the standby
                  database?

                  -----Original Message-----
                  From: oracle-l-bounce_at_freelists.org [mailto:
                  oracle-l-bounce_at_freelists.org]
                  On Behalf Of Martin Bach
                  Sent: Wednesday, January 27, 2010 12:13 PM
                  To: ORACLE-L
                  Subject: Application of archived logs on physical standby
                  very slow

                  Dear listers,

                  I'd like to add another "interesting" problem to today's
                  interesting
                  RMAN problem. I have quite a bit of output from
                  diagnostic tools,
                  hopefully this doesn't got too badly mangled...

                  Our alerting system fired when my physical standby
                  database went out of
                  sync with production. DR and production share the same
                  storage (an entry
                  level MSA) and are multipathed. I could observe this on
                  the hosts (both
                  RHEL 5.2 32bit, don't ask):

                  Wed Jan 27 10:42:42 2010
                  RFS[2]: Archived Log:
                  '/u03/oradata/dr/arch/1_49264_596779339.dbf'
                  Primary database is in MAXIMUM PERFORMANCE mode
                  Wed Jan 27 10:47:55 2010
                  Media Recovery
                  Log /u03/oradata/dr/arch/1_49252_596779339.dbf
                  Wed Jan 27 10:55:27 2010
                  Media Recovery
                  Log /u03/oradata/dr/arch/1_49253_596779339.dbf
                  Wed Jan 27 11:01:46 2010

                  Online redo logs are 500M in size and so are most
                  archived logs, and as
                  you can see it takes 5-10 minutes to apply just one,
                  which is consistent.

                  top - 11:25:37 up 57 days, 17:34, 13 users, �load
                  average: 0.55, 1.05, 1.19
                  Tasks: 209 total, � 1 running, 208 sleeping, � 0 stopped,
                  0 zombie
                  Cpu0 �: �2.8%us, �1.9%sy, �0.0%ni, 51.9%id, 43.5%wa,
                  0.0%hi, �0.0%si,
                  0.0%st
                  Cpu1 �: �1.9%us, �0.9%sy, �0.0%ni, 42.6%id, 54.6%wa,
                  0.0%hi, �0.0%si,
                  0.0%st
                  Cpu2 �: �0.9%us, �0.0%sy, �0.0%ni, 98.1%id, �0.9%wa,
                  0.0%hi, �0.0%si,
                  0.0%st
                  Cpu3 �: �0.0%us, �0.9%sy, �0.0%ni, 99.1%id, �0.0%wa,
                  0.0%hi, �0.0%si,
                  0.0%st
                  Mem: � 8310064k total, �8019968k used, � 290096k free,
                  36456k buffers
                  Swap: �8152948k total, � �70124k used, �8082824k free,
                  7568912k cached



                  � �PID USER � � �PR �NI �VIRT �RES �SHR S %CPU %MEM

                  TIME+ �COMMAND

                  21647 oracle � �15 � 0 2178m �48m �31m S � �3 �0.6
                  0:00.41 ora_mrp0_dr



                  � 7496 oracle � �15 � 0 2145m 1.5g 1.5g S � �2 18.4

                  190:47.10
                  ora_p000_dr


                  � 7500 oracle � �15 � 0 2145m 1.5g 1.5g S � �2 18.4

                  199:38.12
                  ora_p002_dr


                  � 2580 root � � �10 �-5 � � 0 � �0 � �0 S � �1 �0.0

                  17:42.07
                  [kjournald]



                  � 7498 oracle � �16 � 0 2145m 1.5g 1.5g D � �1 18.4

                  189:40.17
                  ora_p001_dr

                  The only processes on cpu are MRP0 and its parallel
                  slaves. Funny
                  enough, iostat doesn't seem to see a problem:

                  avg-cpu: �%user � %nice %system %iowait �%steal � %idle


                  � � � � � � 0.00 � �0.00 � �0.25 � 24.75 � �0.00 � 75.00


                  Device: � � � � rrqm/s � wrqm/s � r/s � w/s � �rkB/s
                  wkB/s avgrq-sz
                  avgqu-sz � await �svctm �%util
                  cciss/c0d0 � � � �0.00 � � 4.00 �0.00 �8.00 � � 0.00
                  48.00 � �12.00


                  � � 0.01 � �0.88 � 0.50 � 0.40

                  cciss/c0d1 � � � �0.00 � � 5.00 �0.00 18.00 � � 0.00
                  92.00 � �10.22


                  � � 0.02 � �0.89 � 0.06 � 0.10

                  sda � � � � � � � 0.00 � � 0.00 �0.00 �0.00 � � 0.00
                  0.00 � � 0.00


                  � � 0.00 � �0.00 � 0.00 � 0.00

                  sdb � � � � � � � 0.00 � � 0.00 �0.00 �0.00 � � 0.00
                  0.00 � � 0.00


                  � � 0.00 � �0.00 � 0.00 � 0.00

                  sdc � � � � � � � 0.00 � 211.00 �0.00 189.00 � � 0.00
                  1600.00 � �16.93


                  � � �0.49 � �2.57 � 2.57 �48.60

                  sdd � � � � � � � 0.00 � 203.00 �0.00 183.00 � � 0.00
                  1544.00 � �16.87


                  � � �0.50 � �2.71 � 2.74 �50.10

                  sde � � � � � � � 0.00 � � 0.00 �0.00 �0.00 � � 0.00
                  0.00 � � 0.00


                  � � 0.00 � �0.00 � 0.00 � 0.00

                  sdf � � � � � � � 0.00 � � 0.00 �0.00 �0.00 � � 0.00
                  0.00 � � 0.00


                  � � 0.00 � �0.00 � 0.00 � 0.00

                  dm-0 � � � � � � �0.00 � � 0.00 �0.00 �0.00 � � 0.00
                  0.00 � � 0.00


                  � � 0.00 � �0.00 � 0.00 � 0.00

                  dm-1 � � � � � � �0.00 � � 0.00 �0.00 �0.00 � � 0.00
                  0.00 � � 0.00


                  � � 0.00 � �0.00 � 0.00 � 0.00

                  dm-2 � � � � � � �0.00 � � 0.00 �0.00 786.00 � � 0.00
                  3144.00 � � 8.00


                  � � �2.09 � �2.64 � 1.26 �98.70

                  dm-3 � � � � � � �0.00 � � 0.00 �0.00 �0.00 � � 0.00
                  0.00 � � 0.00


                  � � 0.00 � �0.00 � 0.00 � 0.00

                  dm-4 � � � � � � �0.00 � � 0.00 �0.00 �0.00 � � 0.00
                  0.00 � � 0.00


                  � � 0.00 � �0.00 � 0.00 � 0.00

                  dm-5 � � � � � � �0.00 � � 0.00 �0.00 786.00 � � 0.00
                  3144.00 � � 8.00


                  � � �2.09 � �2.65 � 1.26 �98.80

                  dm-6 � � � � � � �0.00 � � 0.00 �0.00 23.00 � � 0.00
                  92.00 � � 8.00


                  � � 0.02 � �0.83 � 0.04 � 0.10

                  dm-7 � � � � � � �0.00 � � 0.00 �0.00 �0.00 � � 0.00
                  0.00 � � 0.00


                  � � 0.00 � �0.00 � 0.00 � 0.00

                  dm-8 � � � � � � �0.00 � � 0.00 �0.00 �0.00 � � 0.00
                  0.00 � � 0.00


                  � � 0.00 � �0.00 � 0.00 � 0.00


                  Service time is pretty decent I'd say and the SAN
                  administrators tell me
                  that everything on the array is fine. I checked for AIO
                  and that's also
                  used:

                  [oracle]$ cat /proc/slabinfo | grep ^k
                  kioctx � � � � � � � �25 � � 60 � �192 � 20 � �1 :
                  tunables �120 � 60


                  � 8 : slabdata � � �3 � � �3 � � �0

                  kiocb � � � � � � � � 64 � � 90 � �128 � 30 � �1 :
                  tunables �120 � 60


                  � 8 : slabdata � � �3 � � �3 � � �0


                  Data files are scattered over mount points /u02-/u06, and
                  there are no
                  multipathing (or other I/O related errors) reported in
                  dmesg and
                  /var/log/messages. Archive logs are received on the DR
                  box in time, and
                  there are no gaps. Our sys admin tells me that device
                  mapper multipath
                  is happily distributing IO round robin. Oh yes, and it's
                  Oracle 10.2.0.4
                  32bit.

                  Now does anyone have an idea on how to proceed?

                  Regards,

                  Martin
                  --
                  Martin Bach
                  OCM 10g
                  http://martincarstenbach.wordpress.com.
                  http://www.linkedin.com/in/martincarstenbach
                  --
                  http://www.freelists.org/webpage/oracle-l


                  --
                  http://www.freelists.org/webpage/oracle-l





      --
      Martin Bach
      OCM 10g
      http://martincarstenbach.wordpress.com.
      http://www.linkedin.com/in/martincarstenbach

--
Niall Litchfield
Oracle DBA
http://www.orawin.info

--
http://www.freelists.org/webpage/oracle-l

Received on Fri Jan 29 2010 - 03:15:26 CST

This message: [ Message body ]
Next message: Luca Canali: "RE: Application of archived logs on physical standby very slow"
Previous message: Luca Canali: "RE: ORION results"
In reply to: Niall Litchfield: "Re: Application of archived logs on physical standby very slow"
Next in thread: Luca Canali: "RE: Application of archived logs on physical standby very slow"
Reply: Luca Canali: "RE: Application of archived logs on physical standby very slow"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message