RE: My worst nightmare - ORA-8103

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Tue, 10 Jun 2014 23:08:59 +0000
Message-ID: <CE70217733273F49A8A162EE074F64D901DF9A9E_at_exmbx05.thus.corp>




I think that sort of thing can happen if the block is in a legal state but in the wrong place - hence the "lost write" analysis.  An unformatted block in the middle of an ASSM segments won't be reported as corrupt by dbverify et. al. but if it's unformatted when the space management bitmap says it has been formatted then code that is (e.g.) trying to do a tablescan will read it and find it in an unexpected state.  You could identify the extent that the block is in, then dump the bitmap(s) for the extent to find out the state that the bitmap thinks the block is in - this might tell you that it was formattted and partly used even though the block dump says there's nothing in the block.



 Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
_at_jloracle

________________________________________
From: Maureen English [maureen.english_at_alaska.edu]
Sent: 10 June 2014 23:55
To: Kenny Payton
Cc: oracle-l_at_freelists.org; Jonathan Lewis
Subject: Re: My worst nightmare - ORA-8103

This is really an odd problem.  We ran the rman validate commands on datafiles and the whole database.
We also ran dbverify on all of the datafiles.  Nothing comes back marked as corrupt.  So, maybe the
word corrupt isn't quite right.  The data is truly inaccessible, though.  Oracle says it's lost I/O:

> The ora-8103 was raising accessing rdba: 0x30c76bb9 (195/486329), because the type=0x00= unkown
>
> scn: 0x0000.00000000 seq: 0x01 flg: 0x05 tail: 0x00000001
> frmt: 0x02 chkval: 0x5ad8 type: 0x00=unknown
>
> This block looks as a new block, I don't think had any data in.
> Means you had lost IO.

But when I copy all of the rows from the table that don't give me an ORA-8103 error, using an Oracle
provided script, I'm missing 480 rows....  Whether or not those rows have needed data in them or not
is a good question....

- Maureen

On 6/10/2014 1:23 PM, Kenny Payton wrote:
>
> You might want to run a “validate database” in RMAN and check v$database_block_corruption.  You could also start at a datafile level if you can narrow down where you think the corruption exists with a "validate datafile” command.
>
> Once you identify the blocks you you can then look to RMAN for block recovery as Jonathan suggested.  If you have had successful backups, and you haven’t set MAXCORRUPT to a non-zero value, then you should be able to use those backups to recover the blocks.  Once the blocks are recorded in v$database_block_corruption you could use “RECOVER CORRUPTION LIST” to restore/recover these blocks.
>
>
> Kenny
>
>
>
> On Jun 10, 2014, at 5:05 PM, Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk> wrote:
>
>>
>> The error number is still a puzzle - I would have expected 1578 or 1410 (corrupt block, or invalid rowid), but I guess if a block has got corrupted so that it seems to belong to an object with a higher data object id than expected then you might get an 8103.  (That's speculation - there may be other more obvious reasons why you'd get an 8103 that I've overlooked).
>>
>> Recover from last know is the "obvious" strategy - but if you've been doing rman backups then there's a "block recover" option that could tell rman to pick specific blocks from the file backup and roll forward through archived redo logs from there.  I guess this is why Oracle support have been trying to find the rowids of the lost data. Traditionally you do this by forcing a tablescan (e.g. select count(*) where non-indexed, nullable column = value you don't expect to see) and checking what gets into the dump file.  Recover the guilty block, and repeat until no more corrupt blocks. Perhaps you've already got the list of blocks since you know there are 480 rows missing, though.
>>
>>
>>
>> Regards
>> Jonathan Lewis
>> http://jonathanlewis.wordpress.com
>> _at_jloracle
>>
>> ________________________________________
>> From: Maureen English [maureen.english_at_alaska.edu]
>> Sent: 10 June 2014 21:39
>> To: Jonathan Lewis; oracle-l_at_freelists.org
>> Subject: Re: My worst nightmare - ORA-8103
>>
>> No partitioned objects.
>>
>> Per Oracle Support, it's due to lost I/O.  The blocks are apparently empty, not formatted,
>> data is probably not recoverable.  The Support Analyst suggested restore the datafiles for
>> the tablespace from the last good backup then doing a restore/recover to roll forward to
>> the current time.
>>
>> - Maureen
>>
>> On 6/10/2014 11:07 AM, Jonathan Lewis wrote:
>>>
>>>
>>> Ora-8103 is "object no longer exists" - how does this tie in with a corrupted block ?  Is this a partitioned object with a missing partition ?
>>>
>>>
>>> Regards
>>> Jonathan Lewis
>>> http://jonathanlewis.wordpress.com
>>> _at_jloracle
>>>
>>> ________________________________________
>>> From: oracle-l-bounce_at_freelists.org [oracle-l-bounce_at_freelists.org] on behalf of Maureen English [maureen.english_at_alaska.edu]
>>> Sent: 10 June 2014 19:28
>>> To: oracle-l_at_freelists.org
>>> Subject: My worst nightmare - ORA-8103
>>>
>>> Hi,
>>>
>>> Original post to BOracle list...apologies to those who are seeing this again.
>>>
>>> On 5/31, the refresh of a materialized view in our reporting instance failed
>>> with an ORA-8103 error, as did a gather stats job for the same table in the
>>> production database.  We had an application upgrade done on 6/1 and copied
>>> our production database to a preprod version on 6/2, so the errors weren't
>>> caught immediately.
>>>
>>> Users complained that the current data wasn't available in the reporting instance
>>> but my attempts to refresh and recreate all failed with the same ORA-8103 error.
>>>
>>> I've been working with Oracle Support since early last week and keep hitting
>>> brick walls.  At the moment, we've managed to copy 'uncorrupted' rows out of the
>>> table in our preproduction database and are 480 rows short in a table with 550M
>>> rows in it.  Oracle is currently working to identify the rowids of the corrupt
>>> blocks based on the output in a trace file generated by a failing query.
>>>
>>> I'm looking at Document 336133.1 while I wait for more info from Oracle.
>>>
>>> Anyone have any comments/suggestions/other info that might help identify and fix
>>> the problem?  We really can't afford to lose that much data.  We're working on
>>> recovering our database to a different location to try to get back any data that
>>> we lose, but since I don't have any idea what caused the corruption, I'm lost.
>>>
>>> - Maureen
>>> --
>>> http://www.freelists.org/webpage/oracle-l
>>>
>>>
>>> --
>>> http://www.freelists.org/webpage/oracle-l
>>>
>>>
>>
>> --
>> http://www.freelists.org/webpage/oracle-l
>>
>>
>

--
Maureen English
Lead Database Administrator
University of Alaska
Fairbanks, AK
(907) 450-8329
--
http://www.freelists.org/webpage/oracle-l Received on Wed Jun 11 2014 - 01:08:59 CEST

Original text of this message