Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: Is a SUSPEND really necessary with EMC SnapView

RE: Is a SUSPEND really necessary with EMC SnapView

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Sun, 22 Aug 2004 11:33:28 -0400
Message-ID: <KNEIIDHFLNJDHOOCFCDKCEPKFDAA.mwf@rsiz.com>


Ever since someone suggested that "split backup" mirrors should in fact work in the late 1980's, this has been an example of a test versus design trap.

Even if the split technology seems to work just fine and you have trouble creating a situation where you prove it to break, there may in fact be a design hole in the timing of things such that it is not guaranteed or designed to work. Most of the relatively early "plexing" technologies did not design in clean breaks. Some even marked dissociated plexes as corrupt even though they were probably okay.

The basic question is usually something like: "Do we definitely flush all writes to the plex being dissociated before we release it when we get a dissociate command?" (You can translate plex and dissociate into your volume manager's nouns and verbs.) Since the feature was that you could release storage to be used for something else, most of the early volume managers put more value on giving up the storage quickly that making a clean copy of what you were in fact asking to be no longer a copy. Stopping to explicitly flush outstanding writes is not the fastest way to dissociate, so making it bulletproof was not usually part of the feature of dissociation.

In the intervening years, many of the volume manager technologies, being made aware of the functional requirement, have provided for a clean break, where the intention is exactly what the early adopters of split mirrors wanted.

I'm not in a position to sort out which vendors work exactly which way with which commands, but I want to make sure that you realize you're trying to prove a negative in this case. Working for every case you've tried just isn't good enough. Working for every case you've tried plus the vendor's assurance that it is intended to function as you're using it is probably sufficient. If they tell you that a SUSPEND is required, then do it, even if you can't easily make it fail without the SUSPEND.

mwf

-----Original Message-----

From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org]On Behalf Of Hemant K Chitale Sent: Sunday, August 22, 2004 10:13 AM
To: ORACLE-L_at_freelists.org
Subject: Is a SUSPEND really necessary with EMC SnapView

There has earlier been discussion [with me asking questions about SnapShot/SnapCopy implementations
and later also responding to questions] about how an Oracle Hot Backup is done with
SnapShot/SnapClone mechanisms.

In my organisation I do have a few SnapClone implementations on Hitachi and EMC SANs.
I use BEGIN BACKUP and END BACKUP before and after the split but do not use aSUSPEND.

Recently a colleague of mine tested an EMC SnapView SnapClone of a productiondatabase
using the steps
on primary
  BEGIN BACKUP
   split
  END BACKUP
on secondary

   STARTUP MOUNT {OPEN fails with Recovery Required, as expected}    RECOVER DATABASE
   OPEN
   Run "dbv" on all datafiles
However, later, when we started querying the clone data we found corrupt indexes.
ANALYZE TABLE VALIDATE STRUCTURE CASCADE failed for a few tables.

That is when I came in to the picture. I found an EMC doc on 8i [and also another doc on 9i the
EMC engineer sent me] specifically state why a SUSPEND is required. Both EMCengineers
at my site categorically stated that they use BEGIN BACKUP and END BACKUP butnot a SUSPEND
at other sites. Yet the EMC docs state that a SUSPEND is required.

How have your experiences been ?

{as for the "corrupt database" I have asked the DBA, SysAdmin and EMC engineers to schedule
another test, still without the SUSPEND as the EMC engineers swear that it isnot required}.

http://www.emc.com/pdf/partnersalliances/oracle/clarFC4700_snapview_oracle8i .
'pdf[1]
Page 16
"The use of ALTER SYSTEM SUSPEND is often questioned in backup scenarios where use of different
SNAP or mirror-splitting technologies is leveraged to perform instantaneous, or very rapid, data
duplication.
With hot backups, the physical data content of the various Oracle files continue to change even after a
tablespace has been placed into hot backup mode. Oracle relies on the ordering sequence of how various OS writes to the files are organized to ensure that the
logical content relationship of the files on durable media allow a correct recovery to be performed in the
event of unexpected server or storage system failures. When the Oracle files are distributed over a number of system disk devices, acommon practice in most
Oracle deployments to minimize the impact of single device failure, and to improve general I/O
performance, the different devices have to be duplicated together. However, when we are starting the SnapView sessions on the different devices,they are not started
atomically. Timing windows may exist as a result. The set of Oracle files being snapped may appear to
have lost the required I/O order sequencing. The ALTER SYSTEM SUSPEND command suspends physical I/Os to the various Oracledatabase files
until ALTER SYSTEM RESUME is executed. With I/O suspended to the various database files, there will
be a temporary quiescence of OS level I/O to the various Oracle files. Duringthis window, the physical
content of all the Oracle files would be content-consistent. When all the required SNAP sessions are
successfully started within this window, everything should then be working correctly."

Hemant K Chitale
Oracle 9i Database Administrator Certified Professional http://web.singnet.com.sg/~hkchital

[2]




Please see the official ORACLE-L FAQ: http://www.orafaq.com

To unsubscribe send email to: oracle-l-request_at_freelists.org put 'unsubscribe' in the subject line.
--
Archives are at http://www.freelists.org/archives/oracle-l/
FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html

-----------------------------------------------------------------
Received on Sun Aug 22 2004 - 10:29:23 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US