WAS: Question on starting up Oracle with "startup force" IS: Mandatory readong for HA minded folks

Home -> Community -> Mailing Lists -> Oracle-L -> WAS: Question on starting up Oracle with "startup force" IS: Mandatory readong for HA minded folks

From: Kevin Closson <kevinc_at_polyserve.com>
Date: Thu, 27 Jul 2006 11:17:45 -0700
Message-ID: <5D2570CAFC98974F9B6A759D1C74BAD0E5ABF6@ex2.ms.polyserve.com>

>Folks,
>We are upgrading from "Veritas Database Edition 3.5 for Oracle 9iRAC" to the "Veritas Foundation Suite 4.1 for Oracle >RAC". While testing in the lab with the new version, I noticed that the Veritas agent is starting up Oracle with >"startup force" command as shown below:

First off, if you want to read about
what real, quality HA clusters (failover,rehosting,etc) here is a brief bit: http://www.polyserve.com/pdf/VMDB_BC.pdf

This is a really long reply to a short question--as it should be. The topic deserves deep consideration. Unfortunately it is true that the people who need to read this response the most don't have time--thus the overwhelming success of over-marketed products (no names mentioned).

I don't think startup force should be a concern. In the case of a failover, the shutdown abort that is built into the startup force command is a no-op. Afterall, the server crashed and that is why a failover occurred.

In the event of a switchover (rehosting), the shutdown Veritas issue is important. If they do a normal or immediate shutdown followed by a startup force on the new node, there again I see it as a no-op. Doing startup force on a cleanly shutdown database is a no-op.

The PolyServe Database Utility for Oracle (a failover-HA product) does shutdown immediate (with timeout) followed by shutdown abort when the admin is moving an instance from one server to the other (we call this dynamic re-hosting because it is a simple GUI drag and drop). When rehosting, the database MUST be shutdown on the node being vacated so the:

if ( not immediate)
then abort

thing is very important.

More important than any of that is how to make sure a non-RAC instance is not mistakingly being brought up on 2 servers at the same time. Non-RAC uses node-local locking on the lkSID in $ORACLE_HOME/dbs. Tell me, if there is a non-Shared Home and a database being failed over from node 1 to, say, node 3 who is to say that, perhaps there are processes with the files open on the vacating node? Especially if your HA kit is a bare-bones setup where the database resides in RAW partitions. The locks Oracle rely on only tell it whether or not there is an instance with that database already ON THE CURRENT SERVER. So, if you have a non-RAC database in NAS, CFS or raw partitions, you can accidentally boot instances from the same database on more than one server. Of course if the HA setup is centered around non-shared filesystems this is not an issue. That is, if the database under HA control is in UFS/VxFS/ Ext3 then the failover action includes an unmount from the vacated node and a mount on the new node. For that reason I feel that HA solutions that are based on non-shared storage are complete garbage. A simple re-hosting operation means filesystem mounting operations? Junk.

Oracle Disk Manager (ODM) makes non-RAC databases completely safe in all clustered scenarios since a part of the spec is "cluster keys" associated with every oracle datafile/redo log, etc. PolyServe and Veritas both implement ODM. So, no matter how big the cluster, and no matter how chaotic the association of nodes to databases, ODM will ensure that no more then 1 instance opens a non-RAC database. The lk${SID} file is a no-op when ODM is in play.

In summary, any "HA solution" that uses shared disk, without ODM is rediculously dangerous. And, yep, there is Open Source junk out there that allows the tinkerers of the world to set up failover HA (with steeleye, or clustersuite, whetever) combined with "CFS" (e.g., OCFS which deserves no more mention than that). Anyone that "saves money" that way will be really sorry they didn't get an ODM implementation just as soon as they have 2 instances mistakingly opening files from a non-RAC database. Like I always try to impress upon people, there is more to this clustering stuff than hooking up cables :-).

Best to talk to people that know clustering.

So, to carry the point even further. A little command line output is helpful. PolyServe's ODM exposes ODM cluster info to admins. The same information is used by the HA engine to make sure our failovers work. The following is output of our mxodmstat tool
reporting the association of databases instances to server. Veritas has no such information available. ODM can be implemented in different ways.

There are 5 databases TESTDB1-TESTDB5 on node tmr6s13 to start. I only pick out DBWR and LGWR for brevity.

$mxodmstat -lv | egrep "Node|Writer"
Node Database Instance Application Proc Type Pid File Activity

tmr6s13    TESTDB1          TESTDB1          --                DB Writer
24308 SmallData,LargeData,Other; Read,Write; Sync,Async
tmr6s13    TESTDB1          TESTDB1          --                Log
Writer 24333 SmallData,OLG,Other; Read,Write; Sync,Async
tmr6s13    TESTDB2          TESTDB2          --                DB Writer
6052 SmallData,LargeData,Other; Read,Write; Sync,Async
tmr6s13    TESTDB2          TESTDB2          --                Log
Writer  6057 SmallData,OLG,Other; Read,Write; Sync,Async
tmr6s13    TESTDB3          TESTDB3          --                DB Writer
24567 SmallData,LargeData,Other; Read,Write; Sync,Async
tmr6s13    TESTDB3          TESTDB3          --                Log
Writer 24575 SmallData,OLG,Other; Read,Write; Sync,Async
tmr6s13    TESTDB4          TESTDB4          --                DB Writer
13828 SmallData,LargeData,Other; Read,Write; Sync,Async
tmr6s13    TESTDB4          TESTDB4          --                Log
Writer 13834 SmallData,OLG,Other; Read,Write; Sync,Async
tmr6s13    TESTDB5          TESTDB5          --                DB Writer
3908 SmallData,LargeData,Other; Read,Write; Sync,Async
tmr6s13    TESTDB5          TESTDB5          --                Log

Writer 3912 SmallData,OLG,Other; Read,Write; Sync,Async

...next I use the HA cli command to rehost TESTDB3 to its DEFAULT backup server. I then look at mxodmstat output to see it is on tmr6s14 (different server). Note, that the mxdb command for re-hosting says it will take a moment. So I immediately do mxodmstat again and see that it is still on tmr613, but the next execution of the command shows it is on tmr6s14. Then mxdb is used (-Q) to get more detailed info about it now that it is on tmr6s14 (it has been rehosted)

$mxdb -d TESTDB3 -m DEFAULT
Validating Service Monitor for TESTDB3
Preparing to move the TESTDB3 Service.

Command successfully submitted to the High Availability engine.

Please note, the database will change state asynchronously. Please allow an additional 15 seconds for the command to be properly propagated throughout all nodes. TESTDB3 Service will be moved.
$mxodmstat -lv | grep "TESTDB3.*Writer"

tmr6s13    TESTDB3          TESTDB3          --                DB Writer
24567 SmallData,LargeData,Other; Read,Write; Sync,Async
tmr6s13    TESTDB3          TESTDB3          --                Log

Writer 24575 SmallData,OLG,Other; Read,Write; Sync,Async $mxodmstat -lv | grep "TESTDB3.*Writer"

tmr6s14    TESTDB3          TESTDB3          --                DB Writer
31827 SmallData,Other; Read,Write; Sync,Async
tmr6s14    TESTDB3          TESTDB3          --                Log

Writer 31829 SmallData,OLG,Other; Read,Write; Sync,Async

$mxdb -d TESTDB3 -Q

Preparing to query the TESTDB3 Service on all nodes.

Service TESTDB3:

Connect through vhost: 10.10.60.173
Primary Node is:       10.10.60.14
Currently active on    10.10.60.14
Backups:               BACKUP-1 10.10.60.13

Enabled  Nodes:        10.10.60.13 10.10.60.14
Disabled Nodes:
Maintenance Mode:      OFF

..And, of course I connect using a PolyServe Virtual Oracle Service :

$sqlplus scott/tiger_at_TESTDB3.pdx.polyserve.com

SQL*Plus: Release 10.2.0.2.0 - Production on Thu Jul 27 10:29:41 2006

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options

SQL> ...then, while still running a shell on tmr6s13, I use mxodmstat to see what I/O TESTDB3 is doing (NOTE--without knowing what node it is running
on):

$uname -a;mxodmstat -a op -i3 -D TESTDB3 Linux tmr6s13 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux

                          TESTDB3
            Read                          Write
 Sync  Async    KB/s  Ave ms   Sync  Async    KB/s  Ave ms
 4748     14   43933       1      8    951    6891       0
    0      0       0       0      0   0.67      11       0
  102      4    5050       1      9    293   15128      10
  112      8    5053       1      9    511   15503       9
   60      4    2930       1      5    156    8731       6
   60      4    2578       1      5    238    7902       8
   47      4    1544       1      4    247    5647      10

And for spice, I want to see "who" is doing that I/O so I tell mxodmstat to report I/O by process type (again, central monitoring of different cluster activity):

$uname -a;mxodmstat -i3 -N tmr6s14 -s proc Linux tmr6s13 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux

                                                       tmr6s14
      Background              DB Writer              Log Writer
PQO                  Foreground

Sync Asyn KB/s Ave ms Sync Asyn KB/s Ave ms Sync Asyn KB/s Ave ms Sync Asyn KB/s Ave ms Sync Asyn KB/s Ave ms

6443 1654 238674     1   23 7603 85688     11  982 1643 176746      1
0    0      0      0  278    0   2224      1
  77   59  8653      1    0  253  2981      9   36   49   6038      1
0    0      0      0    0    0      0      0
   4    8   165      0    0   78   941     11   10    3   1848      1
0    0      0      0    0    0      0      0
  78   59  8897      1    0  231  2805     10   38   47   7664      1
0    0      0      0    0    0      0      0
  20   17  2204      0    0   45   683      8   13   13   1998      1
0    0      0      0    0    0      0      0
   3    8   144      0    0  139  1427     12    8 0.67    453      1
0    0      0      0    0    0      0      0

...and finally, cluster keys. Here is mxodmstat output showing control files, data files online logs from TESTDB1-TESTDB5 with their paths and internal (kernel mode) cluster File identifier. When
Oracle opens an ODM file, it tells ODM a guaranteed world-wide unique identifier for the file. ODM maintains that FID in its state. Any other open of that file on any node in the cluster, Oracle will throw the key at ODM to see if it is open, and if it is, well, the files don't get opened ...

$mxodmstat -lf | more

FID Type Path

10610723 CTL
/u01/app/oracle/oradata/TESTDB1/controlfile/o1_mf_NIDYf1vEQ2Jr50_.ctl
10610724 CTL
/u01/app/oracle/flash_recovery_area/TESTDB1/controlfile/o1_mf_NIDYf16Re3
Jr51_.ctl
10610725 DATA
/u01/app/oracle/oradata/TESTDB1/datafile/o1_mf_system_5IDYf1eek3oq50_.db
f
10610726 DATA
/u01/app/oracle/oradata/TESTDB1/datafile/o1_mf_undotbs1_5IDYf1KAK3oq52_.
dbf
10610727 DATA
/u01/app/oracle/oradata/TESTDB1/datafile/o1_mf_sysaux_5IDYf1X8t3oq51_.db
f
10610728 DATA
/u01/app/oracle/oradata/TESTDB1/datafile/o1_mf_users_5IDYf177P3oq53_.dbf
10610729 DATA
/u01/app/oracle/oradata/TESTDB1/datafile/o1_mf_example_WIDYf17v118s50_.d
bf
10610730 DATA
/u01/app/oracle/oradata/TESTDB1/datafile/o1_mf_test_G9EYf1jii25f80_.dbf
10610731 DATA
/u01/app/oracle/oradata/TESTDB1/datafile/o1_mf_temp_TIDYf1Qf14Jr58_.tmp
10610732 OLG
/u01/app/oracle/oradata/TESTDB1/onlinelog/o1_mf_1_PIDYf1cMX3Jr52_.log
10610733 OLG
/u01/app/oracle/flash_recovery_area/TESTDB1/onlinelog/o1_mf_1_QIDYf1eer2
Jr53_.log
10610734 OLG
/u01/app/oracle/oradata/TESTDB1/onlinelog/o1_mf_2_QIDYf1BmX2Jr54_.log
10610735 OLG
/u01/app/oracle/flash_recovery_area/TESTDB1/onlinelog/o1_mf_2_RIDYf1xds1
Jr55_.log
10610736 OLG
/u01/app/oracle/oradata/TESTDB1/onlinelog/o1_mf_3_RIDYf1Pb02Jr56_.log
10610737 OLG
/u01/app/oracle/flash_recovery_area/TESTDB1/onlinelog/o1_mf_3_SIDYf1sDsJ
r57_.log
10610785 CTL
/u01/app/oracle/oradata/TESTDB2/controlfile/o1_mf_cRDYf1hDI1cR60_.ctl
10610786 CTL
/u01/app/oracle/flash_recovery_area/TESTDB2/controlfile/o1_mf_cRDYf1RR62
cR61_.ctl
10610787 DATA
/u01/app/oracle/oradata/TESTDB2/datafile/o1_mf_system_sQDYf1Fo72rP60_.db
f
10610788 DATA
/u01/app/oracle/oradata/TESTDB2/datafile/o1_mf_undotbs1_sQDYf1FFx2rP62_.
dbf
10610789 DATA
/u01/app/oracle/oradata/TESTDB2/datafile/o1_mf_sysaux_sQDYf1C7h2rP61_.db
f
10610790 DATA
/u01/app/oracle/oradata/TESTDB2/datafile/o1_mf_users_sQDYf1ypC2rP63_.dbf
10610791 DATA
/u01/app/oracle/oradata/TESTDB2/datafile/o1_mf_example_kRDYf1xEm3KR60_.d
bf
10610792 DATA
/u01/app/oracle/oradata/TESTDB2/datafile/o1_mf_test_kiEYf1HAR1nn80_.dbf
10610793 DATA
/u01/app/oracle/oradata/TESTDB2/datafile/o1_mf_temp_iRDYf190L2cR68_.tmp
10610794 OLG
/u01/app/oracle/oradata/TESTDB2/onlinelog/o1_mf_1_eRDYf17tP2cR62_.log
10610795 OLG
/u01/app/oracle/flash_recovery_area/TESTDB2/onlinelog/o1_mf_1_fRDYf19na1
cR63_.log
10610796 OLG
/u01/app/oracle/oradata/TESTDB2/onlinelog/o1_mf_2_fRDYf1OWH1cR64_.log
10610797 OLG
/u01/app/oracle/flash_recovery_area/TESTDB2/onlinelog/o1_mf_2_gRDYf1kUbc
R65_.log
10610798 OLG
/u01/app/oracle/oradata/TESTDB2/onlinelog/o1_mf_3_gRDYf1OSHcR66_.log
10610799 OLG
/u01/app/oracle/flash_recovery_area/TESTDB2/onlinelog/o1_mf_3_gRDYf1pxi3
cR67_.log
10610911 CTL
/u01/app/oracle/oradata/TESTDB4/controlfile/o1_mf_CeVYf196u26o30_.ctl
10610912 CTL
/u01/app/oracle/flash_recovery_area/TESTDB4/controlfile/o1_mf_CeVYf1rd63
6o31_.ctl
10610913 DATA
/u01/app/oracle/oradata/TESTDB4/datafile/o1_mf_system_pdVYf1m373Gk30_.db
f
10610914 DATA
/u01/app/oracle/oradata/TESTDB4/datafile/o1_mf_undotbs1_qdVYf1pHyGk32_.d
bf
10610915 DATA
/u01/app/oracle/oradata/TESTDB4/datafile/o1_mf_sysaux_pdVYf1HMZ3Gk31_.db
f
10610916 DATA
/u01/app/oracle/oradata/TESTDB4/datafile/o1_mf_users_qdVYf16c71Gk33_.dbf

--
http://www.freelists.org/webpage/oracle-l

Received on Thu Jul 27 2006 - 13:17:45 CDT