Re: fdisk against the existing datagroup DISK

From: Leyi Zhang (Kamus) <"Leyi>
Date: Tue, 12 Apr 2011 23:20:00 +0800
Message-ID: <BANLkTim1zOrBKwZn1Rb7f962n=eJr9RcYQ_at_mail.gmail.com>



I just did a test to reproduce your issue, suggestion is: DON'T reboot the node unless you find the way to reconstruct /dev/sda1, or you will loose your ASM disk.

# oracleasm querydisk -p VOL41
Disk "VOL41" is a valid ASM disk
/dev/sdb11: LABEL="VOL41" TYPE="oracleasm"

/dev/sdb11 is an ASM disk, part of my NORMALDG. I did exactly the same as what you did to your /dev/sda1.

# fdisk -l /dev/sdb11

Disk /dev/sdb11: 509 MB, 509935104 bytes 255 heads, 63 sectors/track, 61 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes

      Device Boot      Start         End      Blocks   Id  System
/dev/sdb11p1               1          61      489951   83  Linux

Then I reboot the server, and when mount NORMALDG, error issued in ASM alert.log

ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "0" is missing from group number "2"
ERROR: ALTER DISKGROUP NORMALDG MOUNT /* asm agent */ 2011-04-12 22:39:59.214000 +08:00
ASM Health Checker found 1 new failures

Then I try to listdisks, VOL41 disappeared even I scandisks again.

My NORMALDG is normally redundancy, so I can easily recover this issue, but as Freek D'Hooge said, if your ASM diskgroup is using external redundancy, try to remove the disk and let ASM rebalance it, keep in mind backup all your datafiles in DATA diskgroup at first.

--
Kamus <kamusis_at_gmail.com>

Visit my blog for more : http://www.dbform.com
Join ACOUG: http://www.acoug.org



On Tue, Apr 12, 2011 at 7:27 AM, D'Hooge Freek <Freek.DHooge_at_uptime.be> wrote:

> Masha,
>
> Is the asm diskgroup using external redundancy or is it mirrored?
> In the second case, you can relax as all the data is also on another disk. Just drop the asm disk from the diskgroup and rebuild it.
>
> In the first case, you can check the amount of free space in the asm diskgroup. If enough space is still available (more than the lun size of an asm disk), then you can instruct asm to remove the disk from the diskgroup. Asm will then rebalance the disks to move all data from disk6 to the other disks. If you are lucky no errors will occur and afterwards you can partition the disk again and add it to the asm diskgroup.
> If not enough free space is available, the you need to add a new lun first (maybe you can use the /dev/sdai disk for this). I would certainly make sure that you don't reboot the server (The redhad documentation side contains a manual of how to online add/remove luns).
>
> Anyway, make sure you have recent backups of the databases and the archived redo logs and call Oracle when you don't hear from support fast enough.
> Also, open a ticket with Redhat support. They should be able to tell you what is happening with the partition table of the lun, which would give you more information of how it will impact applications reading from this disk.
>
>
> Regards,
>
> Freek D'Hooge
> Uptime
> Oracle Database Administrator
> email: freek.dhooge_at_uptime.be
> tel +32(0)3 451 23 82
> http://www.uptime.be
> disclaimer: www.uptime.be/disclaimer
> ---
> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Masha Gurenich
> Sent: maandag 11 april 2011 22:56
> To: Oracle L
> Subject: fdisk against the existing datagroup DISK
>
> Hi all,
>
> Please, help: the fdisk partition was put inside another partition.
> /dev/sda was already partitioned with a single partition as /dev/sda1
> /dev/sda1 was labeled as DISK6 and was part of the DATA diskgroup.
> But fdisk was run against /dev/sda1 and created another partition table inside /dev/sda1.
> We now have a /dev/sda1p1
>
> That's what i did:
>
> [root_at_oscdevdb1 ~]# fdisk /dev/sda1
> Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
> Building a new DOS disklabel. Changes will remain in memory only,
> until you decide to write them. After that, of course, the previous
> content won't be recoverable.
> The number of cylinders for this disk is set to 51199.
> There is nothing wrong with that, but this is larger than 1024,
> and could in certain setups cause problems with:
> 1) software that runs at boot time (e.g., old versions of LILO)
> 2) booting and partitioning software from other OSs
> (e.g., DOS FDISK, OS/2 FDISK)
> Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
>
> Command (m for help): n
> Command action
> e extended
> p primary partition (1-4)
> p
> Partition number (1-4): 1
> First cylinder (1-51199, default 1):
> Using default value 1
> Last cylinder or +size or +sizeM or +sizeK (1-51199, default 51199):
> Using default value 51199
>
> Command (m for help): w
> The partition table has been altered!
>
> Calling ioctl() to re-read partition table.
>
> WARNING: Re-reading the partition table failed with error 22: Invalid argument.
> The kernel still uses the old table.
> The new table will be used at the next reboot.
> Syncing disks.
> [root_at_oscdevdb1 ~]#
> [root_at_oscdevdb1 ~]# /sbin/fdisk -l /dev/sda
>
> Disk /dev/sda: 53.6 GB, 53687091200 bytes
> 64 heads, 32 sectors/track, 51200 cylinders
> Units = cylinders of 2048 * 512 = 1048576 bytes
>
> Device Boot Start End Blocks Id System
> /dev/sda1 1 51200 52428784 83 Linux
> [root_at_oscdevdb1 ~]# /sbin/fdisk -l /dev/sda1
>
> Disk /dev/sda1: 53.6 GB, 53687074816 bytes
> 64 heads, 32 sectors/track, 51199 cylinders
> Units = cylinders of 2048 * 512 = 1048576 bytes
>
> Device Boot Start End Blocks Id System
> /dev/sda1p1 1 51199 52427760 83 Linux
> [root_at_oscdevdb1 ~]#
> I supposed to do it on sdai1
>
> So, /dev/sda1 was a DISK6 in our diskgroup.
> Now, nothing happened yet. I mean, all 20 database are up and running so far and I got no alerts or anything that would give me clues about catastrophe.
> I don't know what to do, I am freaking out of course, nut I created critical 1 issue with Oracle Support. It's been 4 hours since I did that and an hour and a half that i opened P1 ticket...
>
> I am looking for any hope you could provide..
> It's a 2 node cluster, 11.2.0.1.. RedHat4 64 bit..
>
> please, people, tell me I will not need to rebuild RAC...
>
> thanks,
> M
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>
-- http://www.freelists.org/webpage/oracle-l
Received on Tue Apr 12 2011 - 10:20:00 CDT

Original text of this message