Re: Slow RMAN Channel - KSV master wait/ ASM file metadata operation/ kfk: async disk IO

From: Leng <lkaing_at_gmail.com>
Date: Wed, 31 Oct 2018 08:14:59 +1100
Message-Id: <671D9961-BAC9-407B-9EAA-93DFE43FEE6B_at_gmail.com>



This is a compressed backup so you may be cpu bound. Have you checked cpu?

Have you got bct turned on?

Also try section sized backups.

Cheers,
Leng

> On 31 Oct 2018, at 2:44 am, Chris Taylor <christopherdtaylor1994_at_gmail.com> wrote:
>
> Oracle 12.1.0.2 Linux x86-64
>
> Background:
> We're running a Level0 and a Level1 Merge of a 75+ TB database to an ASM diskroup.
>
> ** Looking for comments/ideas/suggestions on whether this is 'normal' or if its really slow? Seems like a problem to me but I'm not sure as I've never done a Level1 Merge so maybe reading the SYSTEM datafile is supposed to be slow for this step? **
>
> The Level0 ran fine, the Level1 merge started and ran fine for hours.
>
> Now all but one of my RMAN channels (6 total) are finished but one channel is writing very slowly.
>
> Tracing the problematic session shows below:
>
> *** 2018-10-30 11:22:06.940
> WAIT #0: nam='KSV master wait' ela= 384 indicator=10 p2=0 p3=0 obj#=-1 tim=804631999360
> WAIT #0: nam='KSV master wait' ela= 48336 indicator=9 p2=0 p3=0 obj#=-1 tim=804632047895
> WAIT #0: nam='ASM file metadata operation' ela= 443 msgop=41 locn=0 p3=0 obj#=-1 tim=804632048128
> WAIT #0: nam='kfk: async disk IO' ela= 532 count=1 where=9 timeout=4294967295 obj#=-1 tim=804632048763
> WAIT #0: nam='Disk file operations I/O' ela= 220 FileOperation=3 fileno=1 filetype=10 obj#=-1 tim=804632048803
>
> *** 2018-10-30 11:22:16.073
> WAIT #0: nam='resmgr:cpu quantum' ela= 6 location=3 consumer group id=4844 =0 obj#=-1 tim=804641132770
>
> *** 2018-10-30 11:22:18.274
> WAIT #0: nam='KSV master wait' ela= 398 indicator=10 p2=0 p3=0 obj#=-1 tim=804643333365
> WAIT #0: nam='KSV master wait' ela= 49605 indicator=9 p2=0 p3=0 obj#=-1 tim=804643383128
> WAIT #0: nam='ASM file metadata operation' ela= 369 msgop=41 locn=0 p3=0 obj#=-1 tim=804643383326
> WAIT #0: nam='kfk: async disk IO' ela= 673 count=1 where=9 timeout=4294967295 obj#=-1 tim=804643384139
> WAIT #0: nam='Disk file operations I/O' ela= 311 FileOperation=3 fileno=1 filetype=10 obj#=-1 tim=804643384215
>
> *** 2018-10-30 11:22:29.644
> WAIT #0: nam='KSV master wait' ela= 517 indicator=10 p2=0 p3=0 obj#=-1 tim=804654703615
> WAIT #0: nam='KSV master wait' ela= 52485 indicator=9 p2=0 p3=0 obj#=-1 tim=804654756324
> WAIT #0: nam='ASM file metadata operation' ela= 450 msgop=41 locn=0 p3=0 obj#=-1 tim=804654756539
> WAIT #0: nam='kfk: async disk IO' ela= 527 count=1 where=9 timeout=4294967295 obj#=-1 tim=804654757294
> WAIT #0: nam='Disk file operations I/O' ela= 375 FileOperation=3 fileno=1 filetype=10 obj#=-1 tim=804654757356
>
> File# 1 is the system datafile which is size 32.99 GB.
>
> Looking at the session from v$session shows:
> MODULE CLIENT_INFO ACTION OSUSER EVENT WAIT_TIME_SECS WAIT_CLASS
> backup incr datafile rman channel=d0 0000080 STARTED16 oracle Disk file operations I/O 0.000005 User I/O
>
> Strace shows:
> io_submit(139813557071872, 1, {{data:0x7f28dcb09c48, pread, filedes:325, buf:0x7f28dc36d000, nbytes:524288, offset:1734379831296}}) = 1
> times(NULL) = 509955577
> times(NULL) = 509955577
> io_getevents(139813557071872, 4, 128, {{0x7f28dcb09ee8, 0x7f28dcb09ee8, 524288, 0}, {0x7f28dcb0a188, 0x7f28dcb0a188, 524288, 0}}, {0, 0}) = 2
> times(NULL) = 509955577
> getrusage(0x1 /* RUSAGE_??? */, {ru_utime={88371, 365510}, ru_stime={1383, 903614}, ...}) = 0
> times(NULL) = 509955584
> io_submit(139813557071872, 1, {{data:0x7f28dcb09ee8, pread, filedes:325, buf:0x7f28dc2ec000, nbytes:524288, offset:1734380355584}}) = 1
> times(NULL) = 509955584
> times(NULL) = 509955584
> times(NULL) = 509955584
> times(NULL) = 509955584
> io_submit(139813557071872, 1, {{data:0x7f28dcb0a188, pread, filedes:325, buf:0x7f28dc04d000, nbytes:524288, offset:1734380879872}}) = 1
> times(NULL) = 509955584
> times(NULL) = 509955584
> io_getevents(139813557071872, 4, 128, {{0x7f28dcb0a428, 0x7f28dcb0a428, 524288, 0}, {0x7f28dcb09c48, 0x7f28dcb09c48, 524288, 0}}, {0, 0}) = 2
> times(NULL) = 509955584
>
> So strace shows it streaming operations quickly but I can't figure out why its taking so long.
>
> The last operation to the rman log file was at 21:55 (9:55pm) last night.
>
> Here's a sample of the RMAN output before it quit writing to the log - you can see the write times were pretty snappy then it looks like they started slowing down.
>
> channel d3: starting piece 1 at 201810292132
> channel d3: finished piece 1 at 201810292135
> piece handle=+ACTSENTRY/sentry/datafile/u6tgtjra tag=SENTRY_ comment=NONE
> channel d3: backup set complete, elapsed time: 00:02:45
> channel d3: starting compressed incremental level 1 datafile backup set
> channel d3: specifying datafile(s) in backup set
> including current control file in backup set
> channel d3: starting piece 1 at 201810292135
> channel d3: finished piece 1 at 201810292136
> piece handle=+ACTSENTRY/sentry/datafile/u7tgtk0h tag=SENTRY_ comment=NONE
> channel d3: backup set complete, elapsed time: 00:00:25
> channel d3: starting compressed incremental level 1 datafile backup set
> channel d3: specifying datafile(s) in backup set
> including current SPFILE in backup set
> channel d3: starting piece 1 at 201810292136
> channel d3: finished piece 1 at 201810292136
> piece handle=+ACTSENTRY/sentry/datafile/u8tgtk23 tag=SENTRY_ comment=NONE
> channel d3: backup set complete, elapsed time: 00:00:03
> channel d2: finished piece 1 at 201810292137
> piece handle=+ACTSENTRY/sentry/datafile/u2tgtj8t tag=SENTRY_ comment=NONE
> channel d2: backup set complete, elapsed time: 00:15:00
> channel d5: finished piece 1 at 201810292141
> piece handle=+ACTSENTRY/sentry/datafile/u4tgtjcp tag=SENTRY_ comment=NONE
> channel d5: backup set complete, elapsed time: 00:17:06
> channel d4: finished piece 1 at 201810292145
> piece handle=+ACTSENTRY/sentry/datafile/u3tgtjb0 tag=SENTRY_ comment=NONE
> channel d4: backup set complete, elapsed time: 00:22:02
> channel d1: finished piece 1 at 201810292155
> piece handle=+ACTSENTRY/sentry/datafile/u5tgtjp6 tag=SENTRY_ comment=NONE
> channel d1: backup set complete, elapsed time: 00:24:38
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Oct 30 2018 - 22:14:59 CET

Original text of this message