RE: Shutdown Abort, a defense of some guessing

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Tue, 3 Jul 2007 16:01:08 -0400
Message-ID: <01b301c7bdac$e63e84f0$1100a8c0@rsiz.com>

To be precise, and please contradict at will if you think I've got this wrong, the increase in exposure from shutdown abort is unwritten dirty blocks past the point of a putative corruption in the online redo log where the dirty blocks contain correct information.

A corruption in the online redo log nearly never happens, and this is further reduced by the requirement that the log is broken but no dirty blocks past that point are broken (which also results in a failure).

Now Oracle's recovery model cannot survive giving the computer an instruction to write "a" with a checksum and having the actual contents be "b" with the correct checksum due to something going wrong from the point of telling it to write "a" down to getting the correct information saved on the spinning rust. I'm not aware of any recovery model that will survive those conditions. Aside from such putative rates of errors that can be teased out from how bulletproof things like ECC codes are and all the vulnerabilities inherent in physical reality, the last documented software corruption of the online redo logs I'm aware of was a race condition in high cpu count smp ports around 6.0.33.x and fixed by 6.0.36.x (and a corresponding patch to 7.0.? which I think was still beta). Now maybe I missed one, but we're over half a lifetime of Oracle ago for that one.

This is my long winded way of agreeing that it is passing strange and ignoring the numbers to eschew "shutdown abort" as part of routine maintenance. There was a period of years where "shutdown immediate" was a good way to get in trouble on some ports because it was buggy, and before Oracle allowed opening the database before all pending rollbacks were complete "abort" could result in protracted outages itself if you were silly enough to kneecap a large monolith. But nothing else was going to be quicker anyway and in fact the rollback proceeded quicker with less noise since no one could connect to consume resources. Allowing opening sooner as soon as the system tablespace was coherent and the setup of the transactions managing the pending rollbacks were complete was an outgrowth of the demand for reducing down time after a shutdown abort.

Now as for the other thread, I have a slight quibble. I hold that there is a variable amount of economic guessing time ranging from zero seconds to perhaps 5 or 15 minutes depending on the nature and complexity of the question such that the sum of the cost of the trouble shooting or optimization is less than immediately leaping to measuring and analyzing the data that will tell you for sure what is happening. However, I also applaud Alex G.'s "rants" because the likely to be economic guessing time most certainly should have expired well before a consultation to the list is made. When in doubt, timing out on guessing quicker is better than guessing too long. As tools leading to the provably correct answer get easier and easier to use and faster and faster to execute, perhaps I'll some day agree that the high limit for guessing time for all situations has become too close to zero seconds to ever guess. For now I'll subscribe to the notion that those of us who guess at all probably tend to guess too long.

Regards,

mwf

-----Original Message-----
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Jeremiah Wilton
Sent: Tuesday, July 03, 2007 1:47 PM
To: jkstill_at_gmail.com
Cc: randyjo_at_sbcglobal.net; oracle-l_at_freelists.org Subject: Re: Shutdown Abort

Thanks Jared. It has taken all of my strength to not reply to some of the most egregious postings in this thread. My more recent blurb on shutdown abort can be found on page four of my 2004 HA paper:

http://www.ora-600.net/articles/stayinalive.pdf

I will confirm that practically every site that requires very high availability uses 'abort' as SOP. I am really surprised and disappointed by the wild theoretical conjecture that accompanies the steadfast resistance to 'abort'.

Speaking of wild theoretical conjecture, thanks to Alex G. for his recent rants on DBAs and guessing. I have long been an opponent of the 'guessing method' of Oracle tuning, which goes hand in hand with the 'try a bunch of stuff' method of Oracle troubleshooting :-)

Best to all, including the guessers,

Jeremiah Wilton
ORA-600 Consulting
http://www.ora-600.net

Jared Still wrote:
>
> Please see

http://www.speakeasy.org/~jwilton/oracle/shutdown-abort-bad.html
>
> If you are not familiar with Jeremiah Wilton, he was a DBA at Amazon.com
> <http://Amazon.com> from early days.
>
> Amazon has a few databases, and they were/are regularly shutdown with
> abort.

--
http://www.freelists.org/webpage/oracle-l




--
http://www.freelists.org/webpage/oracle-l

Received on Tue Jul 03 2007 - 15:01:08 CDT