Re: Oracle ASM disk corruption

From: Mladen Gogala <gogala.mladen_at_gmail.com>
Date: Mon, 27 Jul 2020 14:57:20 -0400
Message-ID: <31e6b76b-978e-eba3-14b8-e834924825d5_at_gmail.com>



Well, if V$ASM_DISK says that disk is not used and ASM says that it is used then you have an inconsistent build. Hopefully, you have a good backup. Personally, I would sacrifice the problem disk to Dionysus and keep working with what I have..An alternative is to drop and rebuild the GRID group. Judging by the name, this group probably houses the OCR and the performance database. And that means a rebuild of the cluster, including the always hilarious restore from the full backup.

On 7/27/20 1:26 PM, Hameed, Amir wrote:
>
> Thanks Mark.
>
> Please see the information below. I will follow up with Oracle and let
> the list know with the action plan.
>
> I would think that ALTER DISKGROUP GRID DROP DISK GRID_0002 **might**
> fix that.
>
> SQL> ALTER DISKGROUP GRID DROP DISK GRID_0002 ;
>
> ALTER DISKGROUP GRID DROP DISK GRID_0002
>
> *
>
> ERROR at line 1:
>
> ORA-15032: not all alterations performed
>
> ORA-15054: disk "GRID_0002" does not exist in diskgroup "GRID"
>
> Likewise, if has that disk listed as a member of diskgroup GRID, what
> happens if you do an ALTER DISKGROUP GRID REBALANCE?
>
> SQL> ALTER DISKGROUP GRID REBALANCE ;
>
> Diskgroup altered.
>
> From the ASM alert log file:
>
> SQL> ALTER DISKGROUP GRID REBALANCE
>
> Mon Jul 27 13:16:29 2020
>
> NOTE: GroupBlock outside rolling migration privileged region
>
> NOTE: requesting all-instance membership refresh for group=2
>
> Mon Jul 27 13:16:29 2020
>
> GMON updating for reconfiguration, group 2 at 30 for pid 31, osid 25903
>
> NOTE: group GRID: updated PST location: disk 0000 (PST copy 0)
>
> NOTE: group GRID: updated PST location: disk 0001 (PST copy 1)
>
> Mon Jul 27 13:16:29 2020
>
> NOTE: group 2 PST updated.
>
> Mon Jul 27 13:16:29 2020
>
> NOTE: membership refresh pending for group 2/0x88994cfc (GRID)
>
> NOTE: Attempting voting file refresh on diskgroup GRID
>
> NOTE: Refresh completed on diskgroup GRID
>
> . Found 2 voting file(s).
>
> NOTE: Voting file relocation is required in diskgroup GRID
>
> Mon Jul 27 13:16:29 2020
>
> GMON querying group 2 at 31 for pid 22, osid 25543
>
> Mon Jul 27 13:16:29 2020
>
> SUCCESS: refreshed membership for 2/0x88994cfc (GRID)
>
> Mon Jul 27 13:16:29 2020
>
> SUCCESS: ALTER DISKGROUP GRID REBALANCE
>
> ALTER DISKGROUP GRID CHECK
>
> SQL> ALTER DISKGROUP GRID CHECK
>
> SQL> ALTER DISKGROUP GRID CHECK ;
>
> Diskgroup altered.
>
> From the ASM alert log file:
>
> NOTE: starting check of diskgroup GRID
>
> Mon Jul 27 13:19:46 2020
>
> GMON querying group 2 at 37 for pid 31, osid 4062
>
> GMON checking disk 0 for group 2 at 38 for pid 31, osid 4062
>
> GMON querying group 2 at 39 for pid 31, osid 4062
>
> GMON checking disk 1 for group 2 at 40 for pid 31, osid 4062
>
> Mon Jul 27 13:19:46 2020
>
> SUCCESS: check of diskgroup GRID found no errors
>
> Mon Jul 27 13:19:46 2020
>
> SUCCESS: ALTER DISKGROUP GRID CHECK
>
> Thanks
>
> *From:* Mark W. Farnham <mwf_at_rsiz.com>
> *Sent:* Monday, July 27, 2020 9:39 AM
> *To:* Hameed, Amir <Amir.Hameed_at_xerox.com>; gogala.mladen_at_gmail.com;
> oracle-l_at_freelists.org
> *Subject:* RE: Oracle ASM disk corruption
>
> Okay. So it is closed and a member, but ASM has it recorded as still
> belonging to diskgroup “GRID”.
>
> Let’s see: If it is closed and throwing no errors, does that mean that
> a former drop disk had finished rebalancing to drop it but somehow was
> interrupted before some chicklet in ASM was checked?
>
> I would think that ALTER DISKGROUP GRID DROP DISK GRID_0002 **might**
> fix that.
>
> Have you sent the error message below along with the SR information? I
> would think this represents an inconsistency in the ASM dictionary and
> therefore is a bug unless you hand edited something at the OS level.
>
> Likewise, if has that disk listed as a member of diskgroup GRID, what
> happens if you do an ALTER DISKGROUP GRID REBALANCE?
>
> Does that either a) work or b) fail to open the disk and give you some
> additional information?
>
> IF a), great, right?
>
> IF b), let us (and the SR folks) know the new information
>
> IF neither a) nor b), I probably fubared the syntax in my semi-retired
> rust.
>
> You might also report the results of
>
> ALTER DISKGROUP GRID CHECK
>
> Good luck, zero of this should be difficult and it should be 100% self
> diagnostic.
>
> PS: I seriously doubt MLADEN is WRONG about the meaning of the status
> information. Anything I’ve written could be wrong and based on how I
> asked them to do it rather than how they did it. Other than being a
> pain to Veritas, ASM was supposed to be easy to use and bulletproof.
> When one of my best friends from Oracle left ASM, I think it was.
>
> mwf
>
> *From:*oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org>
> [mailto:oracle-l-bounce_at_freelists.org] *On Behalf Of *Hameed, Amir
> *Sent:* Sunday, July 26, 2020 11:04 PM
> *To:* gogala.mladen_at_gmail.com <mailto:gogala.mladen_at_gmail.com>;
> oracle-l_at_freelists.org <mailto:oracle-l_at_freelists.org>
> *Subject:* RE: Oracle ASM disk corruption
>
> Hi Mladen!
>
> Thank you for your input. I already tried that and got the following
> result.
>
> -----
>
> SQL> ALTER DISKGROUP GRID
>
> ADD DISK '/dev/oracleasm/grid/asmgrid01' NAME GRID_0002
>
> /
>
> ALTER DISKGROUP GRID
>
> *
>
> ERROR at line 1:
>
> ORA-15032: not all alterations performed
>
> ORA-15033: disk '/dev/oracleasm/grid/asmgrid01' belongs to diskgroup
> "GRID"
>
> -----
>
> I also opened an SR and the analyst suggested the following action:
>
> /Closed and member status of the disk means that the disk is already
> dropped from asm. The only thing you can do at this point is to format
> that disk and then add it back to asm./
>
> Since it is a block device, I was thinking that overwriting the device
> header would reinitialize it? (I am using UDEV and not using ASMLIB.
> The disk is not partitioned).
>
> Thank you,
>
> Amir
>
> *From:* oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org> <oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org>> *On Behalf Of *Mladen Gogala
> *Sent:* Sunday, July 26, 2020 10:44 PM
> *To:* oracle-l_at_freelists.org <mailto:oracle-l_at_freelists.org>
> *Subject:* Re: Oracle ASM disk corruption
>
> Hi Amir!
>
> The status of CLOSED means that the disk is not being used by the ASM
> instance:
>
> https://docs.oracle.com/en/database/oracle/oracle-database/12.2/refrn/V-ASM_DISK.html#GUID-8E2E5721-6D4E-48C2-8DF3-A0EEBD439606
>
> |MOUNT_STATUS|
>
>
>
> |VARCHAR2(7)|
>
>
>
> Per-instance status of the disk relative to group mounts:
>
> ·|MISSING|- Oracle ASM metadata indicates that the disk is known to be
> part of the Oracle ASM disk group but no disk in the storage system
> was found with the indicated name
>
> ·|CLOSED|- Disk is present in the storage system but is not being
> accessed by Oracle ASM
>
> ·|OPENED|- Disk is present in the storage system and is being accessed
> by Oracle ASM. This is the normal state for disks in a database
> instance which are part of a disk group being actively used by the
> instance.
>
> ·|CACHED|- Disk is present in the storage system and is part of a disk
> group being accessed by the Oracle ASM instance. This is the normal
> state for disks in an Oracle ASM instance which are part of a mounted
> disk group.
>
> ·|IGNORED|- Disk is present in the system but is ignored by Oracle ASM
> because of one of the following:
>
> ·The disk is detected by the system library but is ignored because an
> Oracle ASM library discovered the same disk
>
> ·Oracle ASM has determined that the membership claimed by the disk
> header is no longer valid
>
> ·|CLOSING|- Oracle ASM is in the process of closing this disk
>
> So, the disk is there but it's not used by ASM. You can add it to one
> of your disk groups or leave it as a reserve for the rainy days,
> whatever suits you better. No action is necessary, this is no error
> condition.
>
> Regards
>
> On 7/26/20 10:09 PM, Hameed, Amir wrote:
>
> Hi,
>
> I have an Oracle 12.1.0.2 Grid Infrastructure setup with
> three-nodes. There exist multiple ASM disk groups that are managed
> by this setup. One of the disk groups is called GRID and it hosts
> the OCR and voting disks. Recently I have noticed that one of the
> ASM disks in this group has MOUNT_STATUS='CLOSED" and
> HEADER_STATUS='MEMBER' as shown below:
>
> The following data was captured from V$ASM_DISK but it is
> consistent on all nodes if queried from GV$ASM_DISK:
>
> OS disk Space   Space              Disk
>
> Mount   Header       Mode    Disk     Size    Total Free    ASM
> Disk Failgroup                                 Vote
>
> Grp# Disk# Status  Status       Status  State    (MB)    (MB)
> (MB)    Name       Name       Disk path                      file
>
> ---- ----- ------- ------------ ------- -------- ------- -------
> ------- ---------- ---------- ------------------------------ ----
>
>    0     0 CLOSED MEMBER       ONLINE  NORMAL    20,490 0       0
> /dev/oracleasm/grid/asmgrid01  Y
>
>    2     0 CACHED  MEMBER       ONLINE  NORMAL    20,490  20,480
> 9,987 GRID_0000  GRID_0000  /dev/oracleasm/grid/asmgrid03 Y
>
>    2     1 CACHED  MEMBER       ONLINE  NORMAL    20,490  20,480
> 9,987 GRID_0001  GRID_0001  /dev/oracleasm/grid/asmgrid02 Y
>
> The disk that is not showing up is GRID_0002 and the block device
> name is /dev/oracleasm/grid/asmgrid01. The only change that has
> been made recently was that the OS on all three nodes was upgraded
> from RHEL6 to RHEL7. I have tried to drop this disk from the DG
> but that didn't work and I got the message that this disk is not
> part of the GRID DG.
>
> What is the best way to resolve this issue? Should I overwrite the
> header of this device using dd so that it becomes a candidate
> disk? Any help will be appreciated.
>
> Thank you,
>
> Amir
>
> --
> Mladen Gogala
> Database Consultant
> Tel: (347) 321-1217

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217



--
http://www.freelists.org/webpage/oracle-l
Received on Mon Jul 27 2020 - 20:57:20 CEST

Original text of this message