Received: (qmail 5190 invoked from network); 7 Jul 2011 14:08:52 -0500
Received: from freelists-180.iquest.net (HELO turing.freelists.org) (206.53.239.180)
  by static-ip-85-25-126-90.inaddr.intergenia.de with SMTP; 7 Jul 2011 14:08:48 -0500
Received: from localhost (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id A756CE32C97;
 Thu,  7 Jul 2011 15:08:32 -0400 (EDT)
X-Virus-Scanned: Debian amavisd-new at localhost.localdomain
Received: from turing.freelists.org ([127.0.0.1])
 by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id qZNK+uQexj3Y; Thu,  7 Jul 2011 15:08:32 -0400 (EDT)
Received: from turing.freelists.org (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 0484DE32C50;
 Thu,  7 Jul 2011 15:07:48 -0400 (EDT)
Received: with ECARTIS (v1.0.0; list oracle-l); Thu, 07 Jul 2011 15:07:07 -0400 (EDT)
Received: from localhost (localhost [127.0.0.1])	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 8D0B2E32B17	for <oracle-l@freelists.org>; Thu,  7 Jul 2011 15:07:06 -0400 (EDT)
Received: from turing.freelists.org ([127.0.0.1])	by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)	with ESMTP id 6hKoPA9XHfKT for <oracle-l@freelists.org>;	Thu,  7 Jul 2011 15:07:06 -0400 (EDT)
Received: from nm3.bt.bullet.mail.ird.yahoo.com (nm3.bt.bullet.mail.ird.yahoo.com [212.82.108.234])	by turing.freelists.org (Avenir Technologies Mail Multiplex) with SMTP id D8253E3292E	for <oracle-l@freelists.org>; Thu,  7 Jul 2011 15:07:05 -0400 (EDT)
Received: from [212.82.108.231] by nm3.bt.bullet.mail.ird.yahoo.com with NNFMP; 07 Jul 2011 19:07:04 -0000
Received: from [212.82.108.225] by tm4.bt.bullet.mail.ird.yahoo.com with NNFMP; 07 Jul 2011 19:07:04 -0000
Received: from [127.0.0.1] by omp1002.bt.mail.ird.yahoo.com with NNFMP; 07 Jul 2011 19:07:04 -0000
X-Yahoo-Newman-Id: 513514.26524.bm@omp1002.bt.mail.ird.yahoo.com
Received: (qmail 96797 invoked from network); 7 Jul 2011 19:07:04 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1310065624; bh=zYAKXUqIxZ+j0DADH//xbJNiCwcmpkC1qiim1KIqug0=; h=Received:X-Yahoo-SMTP:X-YMail-OSG:X-Yahoo-Newman-Property:Message-ID:From:To:References:Subject:Date:MIME-Version:Content-Type:Content-Transfer-Encoding:X-Priority:X-MSMail-Priority:X-Mailer:X-MimeOLE; b=sgYEYvbkccZYsrwOILRTpX6tYvKPkmy9EvR7nrrEEAHKHPVcqS/eDw+llGKUFZjhkVFdB1rdddVvxCrR3zv9kzQ2eoc6DI40sDH9NDFVW2MrfIq+1vIPxOo2W4e88wTs7GBsuvPb85UIL7fmsw21HL4aAjCmFOrAdgBrLQyyWxw=
Received: from Primary (jonathan@86.184.47.100 with login)        by smtp820.mail.ird.yahoo.com with SMTP; 07 Jul 2011 19:07:03 +0000 GMT
X-Yahoo-SMTP: 4vWPFZSswBAs9FgodTSfpaeSlgxsILPLGEbCUJdD5X2Ag3l43R3FWMc-
X-YMail-OSG: KEaMZ.UVM1nEMenfrIY4WEp5SYGWNjgPAfOmE8DakuDYzsV n4ME0rf9wG1tCS60CuQ9FHxPOYLe8pqzm8VowgB4yiTY6zCT.fbpCMj3ahJF yWgJVgHQR1kZVbI_QUUqcK6fXZaZNiIxZvfVKjkfeFkrmnD21rUhffBDDF7C hT68f3l7nDeKxwMp9Zobc8GAfD8y4_eVK_SIvUfQ1jprvjfyrGvr4K2Xdvtc jmFXLqvGoFu1CziwesqKaasNS8e6GX.bKgf2LjIiQ27qLznqnrrfwa4f91rb CSZ2ithWvT5PBdA_y61iwL7MlMYS3Ky6iYi1wAsMbHjLfOUU9YCrkmXgiCUW Eev1rLDZf6bQdZQmd6oVJRULX6T72ra5XvG5r9nFOtscsz1fgBoxwl6XlrIS Wzs1GIxaxWM0JaKT4shtFLuQK2qPjydD2nxM-
X-Yahoo-Newman-Property: ymail-3
Message-ID: <1C90243364D44525A36ECB6D66C79A31@Primary>
From: "Jonathan Lewis" <jonathan@jlcomp.demon.co.uk>
To: "free" <oracle-l@freelists.org>
References: <1309987985.44628.YahooMailRC@web65918.mail.ac4.yahoo.com> <A8EF9E974EE04D449D324F47994F9382@Primary> <1310054846.10795.YahooMailRC@web65903.mail.ac4.yahoo.com>
Subject: Re: What is the purpose of segment level checkpoint before DROP/TRUNCATE of a table?
Date: Thu, 7 Jul 2011 20:07:06 +0100
MIME-Version: 1.0
Content-Type: text/plain; format=flowed;	charset="iso-8859-1";	reply-type=original
Content-Transfer-Encoding: 7bit
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
X-archive-position: 37293
X-ecartis-version: Ecartis v1.0.0
Sender: oracle-l-bounce@freelists.org
Errors-to: oracle-l-bounce@freelists.org
X-original-sender: jonathan@jlcomp.demon.co.uk
Precedence: normal
Reply-To: jonathan@jlcomp.demon.co.uk
List-help: <mailto:ecartis@freelists.org?Subject=help>
List-unsubscribe: <oracle-l-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: oracle-l <oracle-l.freelists.org>
X-List-ID: oracle-l <oracle-l.freelists.org>
List-subscribe: <oracle-l-request@freelists.org?Subject=subscribe>
List-owner: <mailto:steve.adams@ixora.com.au>
List-post: <mailto:oracle-l@freelists.org>
List-archive: <http://www.freelists.org/archives/oracle-l>
X-list: oracle-l


Notes in line

Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com


----- Original Message ----- 
From: "Saibabu Devabhaktuni" <saibabu_d@yahoo.com>
To: "Jonathan Lewis" <jonathan@jlcomp.demon.co.uk>; "free" 
<oracle-l@freelists.org>
Sent: Thursday, July 07, 2011 5:07 PM
Subject: Re: What is the purpose of segment level checkpoint before 
DROP/TRUNCATE of a table?


> Hi Jonathan,
>
> If we have a table with 100,000 dirty buffers on primary database.
>
> 1) Let's say 50,000 dirty buffers were already written to disk by DBWR on
> primary.
> 2) At time T1, truncate table command issued on primary.
> 3) If Oracle had this feature to not write buffers as part of truncate, then 
> at
> time T2 Oracle finished marking buffers not to write.
> 4) At time T3, truncate operation is fully completed on primary.
> 5) On the standby, configure very small buffer cache size (i.e. can only fit
> 10,000 buffers).
> 6) As the redo up to time T1 applied on the standby, only 10,000 dirty buffers
> can stay in the cache and rest of the blocks will be written to disk as soon 
> as
> redo is applied.
> 7) Redo as of time T2, will not really mark buffers as not to write, as most 
> of
> them are already on disk. This is when blocks on the standby is going to be
> different from the primary as of same checkpoint time.
>

So what. I've already pointed out that the standby physical files are always 
likely to be different from the primary files at any point in time. Clearly 
Oracle has to be able to deal with that problem because we can always point out 
that Oracle is supposed survive a global powercut because of the basic redo and 
recovery mechanism.

>
> Data loss scenario:
> 1) If there was a system or datafile level checkpoint finished on primary
> between time T2 and T3.
> 2) If primary instance crashes between time T2 and T3, but after the above
> checkpoint was completed.
> 3) Above checkpoint would have skipped writing buffers marked as not to write
> and hence on disk image is not current.
> 4) When instance is starting up, crash recovery starts as of redo from the 
> most
> recent checkpoint.
> 5) After the completion of crash recovery, truncate never really finished, but
> the data in the dirty blocks as of time T1 is missing.
>

Points to worry about - what does it mean to say:
  truncate operation is fully completed - what are the events, and in what 
sequence
  why do you assume that you "mark the buffers as free" (t2) before you complete 
the truncate (t3)
  if you have a checkpoint that finishes between t2 and t3 -
        what does it mean to say that the checkpoint finishes in this context ?
        when did the checkpoint start, and does that matter ? (before t1, 
between t1 and t2, between t2 and t3)
        where do the local writes come into it in your scenario
        where do the updates to the data dictionary come in your scenario


How about this for a truncate sequence.

a)    Grab an exclusive pin on the table definition - this stops any more SQL 
referencing the table from being compiled
        (may have to wait)
b)    Grab an exclusive pin on every cursor referencing the table definition - 
this stops any SQL referencing the table from being executed
        (may have to wait)
c)    Invalidate table and index entries in dictionary cache
d)    Invalidate all cursors referencing the table
e)    Generate redo for correcting segment header blocks etc.
f)    Apply, commit, and generate local writes        ***
g)    Generate redo for the data dictionary to show table (and indexes) have 
been truncated, data_object_id changed.
h)    apply and commit
i)    mark as free all buffers for table and indexes and move to REPL-AUX
        have to ignore blocks on write queue that are already pinned by dbwr
j)    release cursor pins
k)    release object pin

*** Point (f) needs further thought - Oracle must have a mechanism for avoiding 
a race condition for blocks which are subject to local writes when a checkpoint 
is running or you get the option for something similar to your inconsistency 
description:
    dirty block is not written to file during checkpoint because it's supposed 
to be a local write
    local write doesn't take place (for some reason, e.g. session crashes)
    checkpoint completes
    if a recovery is required very soon afterwards (and the local write still 
hasn't happened) then
        the block on disc is wrong
        the recovery process is going to start from the next redo log, and 
therefore not see the redo that should bring the block up to date.


--
http://www.freelists.org/webpage/oracle-l


