Re: shutdown abort and database consistency

From: Howard J. Rogers <dba_at_hjrdba.com>
Date: Wed, 10 Jul 2002 11:13:50 +1000
Message-ID: <agg1nd$267$1@lust.ihug.co.nz>

"Rick Denoire" <100.17706_at_germanynet.de> wrote in message news:8komiu8ql4bpeq03vbj1paqnpv6s7e0fmd_at_4ax.com...
> "Howard J. Rogers" <dba_at_hjrdba.com> wrote:
>
>
> >There's your problem, right there... ORA-600.
>
> OK, I am not expecting to get a solution to this particular problem.
> My concern was to better understand the relationship between the
> effects of a "shutdown abort", the nologging option and a long running
> transaction.
>
> >A complete waste of time, if this job was doing regular 'delete from
table
> >where some_condition...'. Deletes *always* log, no matter whether you
tell
> >them to or not. They are regular DML statements, and regular DML pay no
heed
> >to the nologging attribute. So that's not the problem.
>
> Well, thanks. I didn't know...
>
>
> >> Now I am wondering if an Oracle DB is able to recover from uncommited
> >> transactions that are not being logged in redo logs.
> >
> >The short answer is "no", since if there's nothing in the redo logs, how
do
> >you expect Oracle to "redo" them?
>
> Now let's assume that the operation was of the kind that if set to
> nologging then indeed no logging would occur. When I say "to recover
> from uncommited transactions" I don't mean to get these redone, but
> to recover at all, that is, to be opened in a consistent state. Here
> again the question: Could Orable be able to come up in a consistent
> state after issuing a "shutdown abort" during the execution of
> operations that weren't being logged?
>

The short answer is "yes", but it all depends on what your definition of consistent is. The tests I performed below show that there is never an occasion when shutdown abort in the middle of a nologging operation can't be followed by a startup which opens the database fine. It's what you find in the database once it is opened that is a bit of a gamble.

Before considering your specific question, though (which is a good one, by the way), consider that things can get much, much worse. Suppose you do a nologging bulk insert (no redo, hence unrecoverable). Then suppose you do updates to that data (always logged, therefore in redo, and "recoverable"). Now the file goes AWOL, and you restore an old one from a backup, and roll forward by applying redo (ie, recovery). You're going to get to a point where the update commands require Oracle to update a record which isn't there. It's horrible.

Hence, Oracle always advises that after every nologging operation, you take a fresh backup of the affected datafiles. That way, you'd restore a file containing the bulk-loaded records, and be able to re-perform the subsequent updates without a problem.

In other words, the absence of things in the redo stream is always a problem for Oracle, and would prevent a consistent recovery. But the absence of things in the redo stream is an entirely separate issue from the mode in which you perform your shutdowns.

To answer your specific question first, here's the outcome of a test I just did (this is 9i R2 on Windows 2000, but that won't make a difference to the outcome) -and I hope you appreciate me stuffing up a perfectly serviceable database in the cause of science!! (Please note, my hub decided last night that a puff of blue smoke would enliven proceedings. And this morning the floppy disk drive on my server decided it didn't like floppy disks anymore. So I am paraphrasing wildly, but the test results are sound).

First test: Create a table AS SELECT (CTAS respects the nologging operation). Shutdown abort in the middle of it.

Session 1:
Create table nasty nologging as select * from dba_objects; (chug....chug....chug)

Session 2: shutdown abort. (Session1 responds with an instant 'end of file on communication channel error)
Session 2: startup..... Database Open.

Session 1: select * from nasty;
Error: table or view doesn't exist.

So the database opens fine, but the table is missing, in its entirety. Not just the records.... the table itself. Why? Because when you create a table 'nologging', the entries in the data dictionary to record its creation are not logged either. Therefore, the change to the data dictionary is lost along with the insertion of records. But the database is consistent (consistently missing table NASTY, anyway).

Second test: Create a table nologging, do some inserts which are not logged, and shutdown abort in the middle of one of the inserts:

Session 1: create table nasty nologging as select * from dba_objects; 15000 Records created
insert /* +APPEND */ into nasty select * from nasty; 15000 records added (that's 30000 records in all) Commit
insert /* +APPEND */ into nasty select * from nasty; 30000 records added (that's 60000 records in all) Commit
insert /* +APPEND */ into nasty select * from nasty; 60000 records added (that's 120000 records in all) Commit
insert /* +APPEND */ into nasty select * from nasty; End of file on communication channel (the shutdown abort's just happened).

Startup is fine: the database is clearly consistent.

Select count(*) from nasty;
120000 records

So the last batch insert was lost, but not the preceeding ones. I *was* committing after each insert, though. Had I not been, I would have lost the lot. But the database would have opened consistent.

Now for the real nasty test: Create a table nologging. Update it's records. Shutdown abort prior to a commit, and replace the datafile with a backup taken before the create table statement:

Session 1: create table nasty nologging as select * from dba_objects; Update nasty set object_id=object_id+1973; Commit;

Session 2: shutdown abort
Startup (wipe file and replace)
Error: File 9 needs recovery
Recover automatic datafile 9;
Media Recovery Complete
Alter database open;
Database Opened.

So getting the database open isn't a hassle. The fun starts when I switch back to session 1:

select count(*) from nasty;
Error: ORA-1578 Oracle Data Block Corrupted (file 9, block 556) Error: ORA-26040 Data block was loaded using the NOLOGGING option

So clearly there's a problem (and I still have it!!!!). What I shall have to do is use LOGMINER to determine the exact SCN of the first update statement, and perform an incomplete recovery upto that SCN. (Actually, I'll restore my entire cold backup, but that's just me and my test environment).

Remember Oracle's advice though? After every nologging operation, take a backup. Had I done that, I would have been fine: the bulk-loaded records would have been in the backup image, and the subsequent updates could have been re-performed, no worries.

Also: note that this block corruption problem is NOT a result of a shutdown abort. A shutdown immediate, transactional or normal would have faced the same problem. Remember that I had committed my updates to the nologged records. Therein lies the problem: it is not the mode of shutdown that is the cause of the trouble, but the fact that the redo stream contains redo for committed transactions to be applied to records which can't themselves be re-created, because their redo is missing. All shutdown modes would have faced the same issue when followed by a restore of a backup taken before the nologging operation.

I would have demonstrated that, but my database restore operation is going to take a while, and in any case, it's good for you to test something, too!

Regards
HJR
> >My opinion is that you mustn't jump at shadows. Shutdown aborts are
> >perfectly OK. Deletes always generate redo, so recoverability is not an
> >issue. But huge rollbacks might be.
>
> Now I think that the problem was caused by a kind of overflow of the
> rollback segments.
>
> What I did today: I reloaded all deleted data from a dump file. Since
> the DB would not react to a "shutdown immediate" command, I had to use
> the "shutdown abort", but before doing that, I issued an "alter system
> checkpoint" command. Should not harm. After that, the DB came up and
> no problems have arised sofar.
>
> The problem was solved, and this time the guy is issuing a commit
> every 1000 deletes, so the rollback segment won't overflow.
>
> Thanks a lot.
>
> Rick Denoire
Received on Tue Jul 09 2002 - 20:13:50 CDT