Re: Slow RMAN Channel - KSV master wait/ ASM file metadata operation/ kfk: async disk IO

From: Chris Taylor <christopherdtaylor1994_at_gmail.com>
Date: Tue, 30 Oct 2018 16:22:35 -0500
Message-ID: <CAP79kiRu9UFNFYBH6oocfzZ8+d-cBdPLs28RQYF7vrijCh9sfg_at_mail.gmail.com>



CPU is fine on the source host/db. (Where the RMAN channel is connected) - Been running 50% utilized or less every day.

BCT is not turned on.

The RMAN script does this and I'm wondering if its in the 'recovery' portion of the system datafile though I would expect RMAN to indicate it had moved onto the restore piece.

RMAN> connect target *
2> run {host 'date >> /var/act/log/sentry_rman.log'; 3> allocate channel d0 device type disk format '+actsentry/sentry/datafile/%u' ;
4> allocate channel d1 device type disk format '+actsentry/sentry/datafile/%u' ;
5> allocate channel d2 device type disk format '+actsentry/sentry/datafile/%u' ;
6> allocate channel d3 device type disk format '+actsentry/sentry/datafile/%u' ;
7> allocate channel d4 device type disk format '+actsentry/sentry/datafile/%u' ;
8> allocate channel d5 device type disk format '+actsentry/sentry/datafile/%u' ;
9> backup incremental level 1 for recover of copy with tag 'sentry_' database skip inaccessible;

10> recover copy of database with tag 'sentry_';
11> release channel d0;
12> release channel d1;
13> release channel d2;
14> release channel d3;
15> release channel d4;
16> release channel d5;
17> host 'date >> /var/act/log/sentry_rman.log';
18> }





On Tue, Oct 30, 2018 at 4:15 PM Leng <lkaing_at_gmail.com> wrote:

> This is a compressed backup so you may be cpu bound. Have you checked cpu?
>
> Have you got bct turned on?
>
> Also try section sized backups.
>
> Cheers,
> Leng
>
> On 31 Oct 2018, at 2:44 am, Chris Taylor <christopherdtaylor1994_at_gmail.com>
> wrote:
>
> Oracle 12.1.0.2 Linux x86-64
>
> Background:
> We're running a Level0 and a Level1 Merge of a 75+ TB database to an ASM
> diskroup.
>
> ** Looking for comments/ideas/suggestions on whether this is 'normal' or
> if its really slow? Seems like a problem to me but I'm not sure as I've
> never done a Level1 Merge so maybe reading the SYSTEM datafile is supposed
> to be slow for this step? **
>
> The Level0 ran fine, the Level1 merge started and ran fine for hours.
>
> Now all but one of my RMAN channels (6 total) are finished but one channel
> is writing very slowly.
>
> Tracing the problematic session shows below:
>
> *** 2018-10-30 11:22:06.940
> WAIT #0: nam='KSV master wait' ela= 384 indicator=10 p2=0 p3=0 obj#=-1
> tim=804631999360
> WAIT #0: nam='KSV master wait' ela= 48336 indicator=9 p2=0 p3=0 obj#=-1
> tim=804632047895
> WAIT #0: nam='ASM file metadata operation' ela= 443 msgop=41 locn=0 p3=0
> obj#=-1 tim=804632048128
> WAIT #0: nam='kfk: async disk IO' ela= 532 count=1 where=9
> timeout=4294967295 obj#=-1 tim=804632048763
> WAIT #0: nam='Disk file operations I/O' ela= 220 FileOperation=3 fileno=1
> filetype=10 obj#=-1 tim=804632048803
>
> *** 2018-10-30 11:22:16.073
> WAIT #0: nam='resmgr:cpu quantum' ela= 6 location=3 consumer group
> id=4844 =0 obj#=-1 tim=804641132770
>
> *** 2018-10-30 11:22:18.274
> WAIT #0: nam='KSV master wait' ela= 398 indicator=10 p2=0 p3=0 obj#=-1
> tim=804643333365
> WAIT #0: nam='KSV master wait' ela= 49605 indicator=9 p2=0 p3=0 obj#=-1
> tim=804643383128
> WAIT #0: nam='ASM file metadata operation' ela= 369 msgop=41 locn=0 p3=0
> obj#=-1 tim=804643383326
> WAIT #0: nam='kfk: async disk IO' ela= 673 count=1 where=9
> timeout=4294967295 obj#=-1 tim=804643384139
> WAIT #0: nam='Disk file operations I/O' ela= 311 FileOperation=3 fileno=1
> filetype=10 obj#=-1 tim=804643384215
>
> *** 2018-10-30 11:22:29.644
> WAIT #0: nam='KSV master wait' ela= 517 indicator=10 p2=0 p3=0 obj#=-1
> tim=804654703615
> WAIT #0: nam='KSV master wait' ela= 52485 indicator=9 p2=0 p3=0 obj#=-1
> tim=804654756324
> WAIT #0: nam='ASM file metadata operation' ela= 450 msgop=41 locn=0 p3=0
> obj#=-1 tim=804654756539
> WAIT #0: nam='kfk: async disk IO' ela= 527 count=1 where=9
> timeout=4294967295 obj#=-1 tim=804654757294
> WAIT #0: nam='Disk file operations I/O' ela= 375 FileOperation=3 fileno=1
> filetype=10 obj#=-1 tim=804654757356
>
> File# 1 is the system datafile which is size 32.99 GB.
>
> Looking at the session from v$session shows:
> MODULE CLIENT_INFO ACTION OSUSER EVENT WAIT_TIME_SECS WAIT_CLASS
> backup incr datafile rman channel=d0 0000080 STARTED16 oracle Disk file
> operations I/O 0.000005 User I/O
> Strace shows:
> io_submit(139813557071872, 1, {{data:0x7f28dcb09c48, pread, filedes:325,
> buf:0x7f28dc36d000, nbytes:524288, offset:1734379831296}}) = 1
> times(NULL) = 509955577
> times(NULL) = 509955577
> io_getevents(139813557071872, 4, 128, {{0x7f28dcb09ee8, 0x7f28dcb09ee8,
> 524288, 0}, {0x7f28dcb0a188, 0x7f28dcb0a188, 524288, 0}}, {0, 0}) = 2
> times(NULL) = 509955577
> getrusage(0x1 /* RUSAGE_??? */, {ru_utime={88371, 365510}, ru_stime={1383,
> 903614}, ...}) = 0
> times(NULL) = 509955584
> io_submit(139813557071872, 1, {{data:0x7f28dcb09ee8, pread, filedes:325,
> buf:0x7f28dc2ec000, nbytes:524288, offset:1734380355584}}) = 1
> times(NULL) = 509955584
> times(NULL) = 509955584
> times(NULL) = 509955584
> times(NULL) = 509955584
> io_submit(139813557071872, 1, {{data:0x7f28dcb0a188, pread, filedes:325,
> buf:0x7f28dc04d000, nbytes:524288, offset:1734380879872}}) = 1
> times(NULL) = 509955584
> times(NULL) = 509955584
> io_getevents(139813557071872, 4, 128, {{0x7f28dcb0a428, 0x7f28dcb0a428,
> 524288, 0}, {0x7f28dcb09c48, 0x7f28dcb09c48, 524288, 0}}, {0, 0}) = 2
> times(NULL) = 509955584
>
> So strace shows it streaming operations quickly but I can't figure out why
> its taking so long.
>
> The last operation to the rman log file was at 21:55 (9:55pm) last night.
>
> Here's a sample of the RMAN output before it quit writing to the log -
> you can see the write times were pretty snappy then it looks like they
> started slowing down.
>
> channel d3: starting piece 1 at 201810292132
> channel d3: finished piece 1 at 201810292135
> piece handle=+ACTSENTRY/sentry/datafile/u6tgtjra tag=SENTRY_ comment=NONE
> channel d3: backup set complete, elapsed time: 00:02:45
> channel d3: starting compressed incremental level 1 datafile backup set
> channel d3: specifying datafile(s) in backup set
> including current control file in backup set
> channel d3: starting piece 1 at 201810292135
> channel d3: finished piece 1 at 201810292136
> piece handle=+ACTSENTRY/sentry/datafile/u7tgtk0h tag=SENTRY_ comment=NONE
> channel d3: backup set complete, elapsed time: 00:00:25
> channel d3: starting compressed incremental level 1 datafile backup set
> channel d3: specifying datafile(s) in backup set
> including current SPFILE in backup set
> channel d3: starting piece 1 at 201810292136
> channel d3: finished piece 1 at 201810292136
> piece handle=+ACTSENTRY/sentry/datafile/u8tgtk23 tag=SENTRY_ comment=NONE
> channel d3: backup set complete, elapsed time: 00:00:03
> channel d2: finished piece 1 at 201810292137
> piece handle=+ACTSENTRY/sentry/datafile/u2tgtj8t tag=SENTRY_ comment=NONE
> channel d2: backup set complete, elapsed time: 00:15:00
> channel d5: finished piece 1 at 201810292141
> piece handle=+ACTSENTRY/sentry/datafile/u4tgtjcp tag=SENTRY_ comment=NONE
> channel d5: backup set complete, elapsed time: 00:17:06
> channel d4: finished piece 1 at 201810292145
> piece handle=+ACTSENTRY/sentry/datafile/u3tgtjb0 tag=SENTRY_ comment=NONE
> channel d4: backup set complete, elapsed time: 00:22:02
> channel d1: finished piece 1 at 201810292155
> piece handle=+ACTSENTRY/sentry/datafile/u5tgtjp6 tag=SENTRY_ comment=NONE
> channel d1: backup set complete, elapsed time: 00:24:38
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Oct 30 2018 - 22:22:35 CET

Original text of this message