Mislabeled ASM disk corrects the label by itself

From: Yong Huang <"Yong>
Date: Mon, 6 Feb 2017 16:16:17 +0000 (UTC)
Message-ID: <1297675356.2714973.1486397777505_at_mail.yahoo.com>

Oracle on Red Hat Enterprise Linux 6.6. Using ASMLib.

We probably hit
Bug 19601762 : ASMLIB DISK HEADER LABEL WILL BE REMOVED AFTER ONLINING PRIOR OFFLINED DISKS After storage maintenance work, some ASM disks got their labels switched:

SQL> select path, label from v$asm_disk where mount_status = 'CLOSED' and header_status = 'MEMBER';

PATH                LABEL
------------------- --------------

ORCL:ASM_DATA11_1MC ASM_DATA11_1MC Take the first two as examples and check the headers:

$ kfed read /dev/oracleasm/disks/ASM_DATA08_1MC | egrep 'provstr|dskname|grpname|fgname'

kfdhdb.driver.provstr:ORCLDISKASM_GRID01_1MC ; 0x000: length=22 <-- wrong provstring
kfdhdb.dskname: ASM_GRID01_1MC ; 0x028: length=14               <-- wrong disk name
kfdhdb.grpname: GRID_DG ; 0x048: length=7                       <-- wrong group name
kfdhdb.fgname: ASM_GRID01_1MC ; 0x068: length=14                <-- wrong failgroup name

$ kfed read /dev/oracleasm/disks/ASM_GRID01_1MC | egrep 'provstr|dskname|grpname|fgname'
kfdhdb.driver.provstr:ORCLDISKASM_DATA08_1MC ; 0x000: length=22 <-- wrong
kfdhdb.dskname: ASM_DATA08_1MC ; 0x028: length=14               <-- wrong
kfdhdb.grpname: CRT_DG1 ; 0x048: length=7                       <-- wrong
kfdhdb.fgname: ASM_1MC ; 0x068: length=7                        <-- wrong

Those two ASM disks got the header content switched between them. But after about 20 hours, they were corrected:

$ kfed read /dev/oracleasm/disks/ASM_DATA08_1MC | egrep 'provstr|dskname|grpname|fgname'

kfdhdb.driver.provstr:ORCLDISKASM_DATA08_1MC ; 0x000: length=22
kfdhdb.dskname: ASM_DATA08_1MC ; 0x028: length=14
kfdhdb.grpname: CRT_DG1 ; 0x048: length=7
kfdhdb.fgname: ASM_1MC ; 0x068: length=7

That is, the 'kfed read' result matches the label ASM_DATA08_1MC. We can't find any possible event that could have corrected it. Right before the next day recheck, we ran command 'oracleasm scandisks'. Although we didn't check *before* running this command, we had exactly the same problem on another cluster and we ran 'oracleasm scandisks' multiple times on that cluster without correcting the problem. So scandisks would not likely have corrected the labels. Something must have happened to trigger this self-correction. But ASM alert.log or other trace files or /var/log/messages don't show anything relevant.

We know we can run 'oracleasm renamedisk' to correct the label. But we're curious about this self-correction. Has anybody seen this?

Yong Huang

Received on Mon Feb 06 2017 - 17:16:17 CET

Original text of this message