Re: Slow RMAN Channel - KSV master wait/ ASM file metadata operation/ kfk: async disk IO

From: Leng <lkaing_at_gmail.com>
Date: Wed, 31 Oct 2018 08:39:58 +1100
Message-Id: <858FBC07-9225-45BB-87AA-AD53BCB85492_at_gmail.com>



The Rman bit you posted is all about backups. It’s not even at the merge stage. So you need to tune the backup per my previous suggestions. Also default compression is slow. Oracle wants you to buy advanced compression for fast compression speed. If you believe datafile 1 is the problem you can just focus on that datafile.

Try each option (uncompressed, bct, section size) with this command and see how you go.

Backup incremental level 1 ... datafile 1;

Cheers,
Leng

> On 31 Oct 2018, at 8:22 am, Chris Taylor <christopherdtaylor1994_at_gmail.com> wrote:
> 
> CPU is fine on the source host/db.  (Where the RMAN channel is connected) - Been running 50% utilized or less every day.
> 
> BCT is not turned on.
> 
> The RMAN script does this and I'm wondering if its in the 'recovery' portion of the system datafile though I would expect RMAN to indicate it had moved onto the restore piece.
> 
> RMAN> connect target *
> 2> run {host 'date >> /var/act/log/sentry_rman.log';
> 3> allocate channel d0 device type disk format '+actsentry/sentry/datafile/%u' ;
> 4> allocate channel d1 device type disk format '+actsentry/sentry/datafile/%u' ;
> 5> allocate channel d2 device type disk format '+actsentry/sentry/datafile/%u' ;
> 6> allocate channel d3 device type disk format '+actsentry/sentry/datafile/%u' ;
> 7> allocate channel d4 device type disk format '+actsentry/sentry/datafile/%u' ;
> 8> allocate channel d5 device type disk format '+actsentry/sentry/datafile/%u' ;
> 9> backup incremental level 1 for recover of copy with tag 'sentry_' database skip inaccessible;
> 10> recover copy of database with tag 'sentry_';
> 11> release channel d0;
> 12> release channel d1;
> 13> release channel d2;
> 14> release channel d3;
> 15> release channel d4;
> 16> release channel d5;
> 17> host 'date >> /var/act/log/sentry_rman.log';
> 18> }
> 
> 
> 
> 
> 
>> On Tue, Oct 30, 2018 at 4:15 PM Leng <lkaing_at_gmail.com> wrote:
>> This is a compressed backup so you may be cpu bound. Have you checked cpu?
>> 
>> Have you got bct turned on?
>> 
>> Also try section sized backups.
>> 
>> Cheers,
>> Leng
>> 

>>> On 31 Oct 2018, at 2:44 am, Chris Taylor <christopherdtaylor1994_at_gmail.com> wrote:
>>>
>>> Oracle 12.1.0.2 Linux x86-64
>>>
>>> Background:
>>> We're running a Level0 and a Level1 Merge of a 75+ TB database to an ASM diskroup.
>>>
>>> ** Looking for comments/ideas/suggestions on whether this is 'normal' or if its really slow? Seems like a problem to me but I'm not sure as I've never done a Level1 Merge so maybe reading the SYSTEM datafile is supposed to be slow for this step? **
>>>
>>> The Level0 ran fine, the Level1 merge started and ran fine for hours.
>>>
>>> Now all but one of my RMAN channels (6 total) are finished but one channel is writing very slowly.
>>>
>>> Tracing the problematic session shows below:
>>>
>>> *** 2018-10-30 11:22:06.940
>>> WAIT #0: nam='KSV master wait' ela= 384 indicator=10 p2=0 p3=0 obj#=-1 tim=804631999360
>>> WAIT #0: nam='KSV master wait' ela= 48336 indicator=9 p2=0 p3=0 obj#=-1 tim=804632047895
>>> WAIT #0: nam='ASM file metadata operation' ela= 443 msgop=41 locn=0 p3=0 obj#=-1 tim=804632048128
>>> WAIT #0: nam='kfk: async disk IO' ela= 532 count=1 where=9 timeout=4294967295 obj#=-1 tim=804632048763
>>> WAIT #0: nam='Disk file operations I/O' ela= 220 FileOperation=3 fileno=1 filetype=10 obj#=-1 tim=804632048803
>>>
>>> *** 2018-10-30 11:22:16.073
>>> WAIT #0: nam='resmgr:cpu quantum' ela= 6 location=3 consumer group id=4844 =0 obj#=-1 tim=804641132770
>>>
>>> *** 2018-10-30 11:22:18.274
>>> WAIT #0: nam='KSV master wait' ela= 398 indicator=10 p2=0 p3=0 obj#=-1 tim=804643333365
>>> WAIT #0: nam='KSV master wait' ela= 49605 indicator=9 p2=0 p3=0 obj#=-1 tim=804643383128
>>> WAIT #0: nam='ASM file metadata operation' ela= 369 msgop=41 locn=0 p3=0 obj#=-1 tim=804643383326
>>> WAIT #0: nam='kfk: async disk IO' ela= 673 count=1 where=9 timeout=4294967295 obj#=-1 tim=804643384139
>>> WAIT #0: nam='Disk file operations I/O' ela= 311 FileOperation=3 fileno=1 filetype=10 obj#=-1 tim=804643384215
>>>
>>> *** 2018-10-30 11:22:29.644
>>> WAIT #0: nam='KSV master wait' ela= 517 indicator=10 p2=0 p3=0 obj#=-1 tim=804654703615
>>> WAIT #0: nam='KSV master wait' ela= 52485 indicator=9 p2=0 p3=0 obj#=-1 tim=804654756324
>>> WAIT #0: nam='ASM file metadata operation' ela= 450 msgop=41 locn=0 p3=0 obj#=-1 tim=804654756539
>>> WAIT #0: nam='kfk: async disk IO' ela= 527 count=1 where=9 timeout=4294967295 obj#=-1 tim=804654757294
>>> WAIT #0: nam='Disk file operations I/O' ela= 375 FileOperation=3 fileno=1 filetype=10 obj#=-1 tim=804654757356
>>>
>>> File# 1 is the system datafile which is size 32.99 GB.
>>>
>>> Looking at the session from v$session shows:
>>> MODULE CLIENT_INFO ACTION OSUSER EVENT WAIT_TIME_SECS WAIT_CLASS
>>> backup incr datafile rman channel=d0 0000080 STARTED16 oracle Disk file operations I/O 0.000005 User I/O
>>>
>>> Strace shows:
>>> io_submit(139813557071872, 1, {{data:0x7f28dcb09c48, pread, filedes:325, buf:0x7f28dc36d000, nbytes:524288, offset:1734379831296}}) = 1
>>> times(NULL) = 509955577
>>> times(NULL) = 509955577
>>> io_getevents(139813557071872, 4, 128, {{0x7f28dcb09ee8, 0x7f28dcb09ee8, 524288, 0}, {0x7f28dcb0a188, 0x7f28dcb0a188, 524288, 0}}, {0, 0}) = 2
>>> times(NULL) = 509955577
>>> getrusage(0x1 /* RUSAGE_??? */, {ru_utime={88371, 365510}, ru_stime={1383, 903614}, ...}) = 0
>>> times(NULL) = 509955584
>>> io_submit(139813557071872, 1, {{data:0x7f28dcb09ee8, pread, filedes:325, buf:0x7f28dc2ec000, nbytes:524288, offset:1734380355584}}) = 1
>>> times(NULL) = 509955584
>>> times(NULL) = 509955584
>>> times(NULL) = 509955584
>>> times(NULL) = 509955584
>>> io_submit(139813557071872, 1, {{data:0x7f28dcb0a188, pread, filedes:325, buf:0x7f28dc04d000, nbytes:524288, offset:1734380879872}}) = 1
>>> times(NULL) = 509955584
>>> times(NULL) = 509955584
>>> io_getevents(139813557071872, 4, 128, {{0x7f28dcb0a428, 0x7f28dcb0a428, 524288, 0}, {0x7f28dcb09c48, 0x7f28dcb09c48, 524288, 0}}, {0, 0}) = 2
>>> times(NULL) = 509955584
>>>
>>> So strace shows it streaming operations quickly but I can't figure out why its taking so long.
>>>
>>> The last operation to the rman log file was at 21:55 (9:55pm) last night.
>>>
>>> Here's a sample of the RMAN output before it quit writing to the log - you can see the write times were pretty snappy then it looks like they started slowing down.
>>>
>>> channel d3: starting piece 1 at 201810292132
>>> channel d3: finished piece 1 at 201810292135
>>> piece handle=+ACTSENTRY/sentry/datafile/u6tgtjra tag=SENTRY_ comment=NONE
>>> channel d3: backup set complete, elapsed time: 00:02:45
>>> channel d3: starting compressed incremental level 1 datafile backup set
>>> channel d3: specifying datafile(s) in backup set
>>> including current control file in backup set
>>> channel d3: starting piece 1 at 201810292135
>>> channel d3: finished piece 1 at 201810292136
>>> piece handle=+ACTSENTRY/sentry/datafile/u7tgtk0h tag=SENTRY_ comment=NONE
>>> channel d3: backup set complete, elapsed time: 00:00:25
>>> channel d3: starting compressed incremental level 1 datafile backup set
>>> channel d3: specifying datafile(s) in backup set
>>> including current SPFILE in backup set
>>> channel d3: starting piece 1 at 201810292136
>>> channel d3: finished piece 1 at 201810292136
>>> piece handle=+ACTSENTRY/sentry/datafile/u8tgtk23 tag=SENTRY_ comment=NONE
>>> channel d3: backup set complete, elapsed time: 00:00:03
>>> channel d2: finished piece 1 at 201810292137
>>> piece handle=+ACTSENTRY/sentry/datafile/u2tgtj8t tag=SENTRY_ comment=NONE
>>> channel d2: backup set complete, elapsed time: 00:15:00
>>> channel d5: finished piece 1 at 201810292141
>>> piece handle=+ACTSENTRY/sentry/datafile/u4tgtjcp tag=SENTRY_ comment=NONE
>>> channel d5: backup set complete, elapsed time: 00:17:06
>>> channel d4: finished piece 1 at 201810292145
>>> piece handle=+ACTSENTRY/sentry/datafile/u3tgtjb0 tag=SENTRY_ comment=NONE
>>> channel d4: backup set complete, elapsed time: 00:22:02
>>> channel d1: finished piece 1 at 201810292155
>>> piece handle=+ACTSENTRY/sentry/datafile/u5tgtjp6 tag=SENTRY_ comment=NONE
>>> channel d1: backup set complete, elapsed time: 00:24:38
>>>
>>>
--
http://www.freelists.org/webpage/oracle-l
Received on Tue Oct 30 2018 - 22:39:58 CET

Original text of this message