Received: (qmail 1666 invoked from network); 7 Jul 2011 21:41:23 -0500
Received: from freelists-180.iquest.net (HELO turing.freelists.org) (206.53.239.180)
  by static-ip-85-25-126-90.inaddr.intergenia.de with SMTP; 7 Jul 2011 21:41:12 -0500
Received: from localhost (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 97ACCE32D30;
 Thu,  7 Jul 2011 22:41:10 -0400 (EDT)
X-Virus-Scanned: Debian amavisd-new at localhost.localdomain
Received: from turing.freelists.org ([127.0.0.1])
 by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id kaRQQuxEpkXE; Thu,  7 Jul 2011 22:41:10 -0400 (EDT)
Received: from turing.freelists.org (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id B6CFAE32C2C;
 Thu,  7 Jul 2011 22:40:26 -0400 (EDT)
Received: with ECARTIS (v1.0.0; list oracle-l); Thu, 07 Jul 2011 22:39:45 -0400 (EDT)
Received: from localhost (localhost [127.0.0.1])	by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 9D7A9E32B1E	for <oracle-l@freelists.org>; Thu,  7 Jul 2011 22:39:44 -0400 (EDT)
Received: from turing.freelists.org ([127.0.0.1])	by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)	with ESMTP id XfiWVS2KrMtN for <oracle-l@freelists.org>;	Thu,  7 Jul 2011 22:39:44 -0400 (EDT)
Received: from n2b.bullet.cnb.yahoo.com (n2b.bullet.cnb.yahoo.com [202.43.216.212])	by turing.freelists.org (Avenir Technologies Mail Multiplex) with SMTP id 0D77DE311FB	for <oracle-l@freelists.org>; Thu,  7 Jul 2011 22:39:42 -0400 (EDT)
Received: from [203.209.230.73] by n2.bullet.cnb.yahoo.com with NNFMP; 08 Jul 2011 02:39:39 -0000
Received: from [202.165.102.48] by t3.bullet.cnb.yahoo.com with NNFMP; 08 Jul 2011 02:39:39 -0000
Received: from [127.0.0.1] by omp102.mail.cnb.yahoo.com with NNFMP; 08 Jul 2011 02:39:39 -0000
X-Yahoo-Newman-Property: ymail-3
X-Yahoo-Newman-Id: 504935.29534.bm@omp102.mail.cnb.yahoo.com
Received: (qmail 96562 invoked by uid 60001); 8 Jul 2011 02:39:39 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com.cn; s=s1024; t=1310092779; bh=Nha4Cr77V3vb3ySeDs+kWM4viLmk85Ax/plNiFird4Q=; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=iVx8ihx4iHK6bFBg26E+rTCHgOMK3FRcZEHwPuHVAIdwi1+cO6UrtIiFdR1MOX3MlStcr2QgfJjjvQdVsozLkHB4cap0q9ejQEho9a0Jlgshx7Aso1yWTP++CSaLdCgs3PuRni6bmWImtiH6b47fgY9auVAAc6EmBvR2WpMBvjE=
DomainKey-Signature:a=rsa-sha1; q=dns; c=nofws;  s=s1024; d=yahoo.com.cn;  h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type;  b=WS34ukJi5xwhBV0SDO2IHG5TIj4clEHLnI7EAaFGLnfltTnzpuDjKYMNHbBaCscEmbuKNQhccdNSsUpQFb9D6K2Zkn5uixIP8Mo3fSsR5MpWY4Ml6hC74N2MMOWd6A6sqZDlFzNoVZK2I9vsEJX3OuiQQshePVBIhrhc6wjyGxs=;
X-YMail-OSG: FUZc8b4VM1kiDezFrD1FLsn76lEKni9IEP3oHYg0qz1aDuu WeLzslRB4ZG6PfX4SBjfE9I6r2WvpLeBhS0n78pwC.zn2qKJK_zGUXLRXrIi CjrSVByJ5XjHMFBSAMosVOALj.1NaRPsx0I0Raxuk2CZGpXgAhkfx46ZVSAY 3rQUknezDbNaS320hUgqD3bnRi0.i18mLTYae8WtItjH_Xl0Lw_A97WJ9kL6 f09Qj0yu2QF8bBEesrySBiG3b0WNVaOc3K9Me_29OVpFzIQQT5FC3DyJ1cVx I9YysTKqYN165ZoEVk3osujZ5ZFGqv8YlTCIdkRpHZewYLxvJxJ45xsdrLIq zVrK2BPcqrXIsWPcztzJFZhcraTNw51enSw1YGHI4fe49SDKjDr3ntK0926h W2IzaW8bvOpZ8I.hQpPrRv4Hm4h5eXPKr.ZkbDaZux6kmyfZeJjoN3imDLAM .NlMMFcae1K43deaQWbsYkuFj2CK5knpDe.ukuGPLh8QczCwBEKeE..ZaD0I o6RUXqSMeY.jp5m3MMyVOgHlSi.6rmxL9jqZ3Ge9NQhkM14KjTFkItmymqa0 ZO6UY9obKltA_jTvmXJS4dWPHY.JBy.5kv3HsP.kcKPNBiGkdCrUnig--
Received: from [124.42.13.230] by web15808.mail.cnb.yahoo.com via HTTP; Fri, 08 Jul 2011 10:39:39 CST
Message-ID: <1310092779.71099.YahooMailClassic@web15808.mail.cnb.yahoo.com>
Date: Fri, 8 Jul 2011 10:39:39 +0800 (CST)
From: Hua Cui <allantreycn@yahoo.com.cn>
Subject: Re: What is the purpose of segment level checkpoint before DROP/TRUNCATE of a table?
To: free <oracle-l@freelists.org>, jonathan@jlcomp.demon.co.uk
In-Reply-To: <1C90243364D44525A36ECB6D66C79A31@Primary>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="0-1382313819-1310092779=:71099"
X-archive-position: 37294
X-ecartis-version: Ecartis v1.0.0
Sender: oracle-l-bounce@freelists.org
Errors-to: oracle-l-bounce@freelists.org
X-original-sender: allantreycn@yahoo.com.cn
Precedence: normal
Reply-To: allantreycn@yahoo.com.cn
List-help: <mailto:ecartis@freelists.org?Subject=help>
List-unsubscribe: <oracle-l-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: oracle-l <oracle-l.freelists.org>
X-List-ID: oracle-l <oracle-l.freelists.org>
List-subscribe: <oracle-l-request@freelists.org?Subject=subscribe>
List-owner: <mailto:steve.adams@ixora.com.au>
List-post: <mailto:oracle-l@freelists.org>
List-archive: <http://www.freelists.org/archives/oracle-l>
X-list: oracle-l
--0-1382313819-1310092779=:71099
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

I think there are two reasons:
1. Oracle don't want to touch the data=C2=A0blocks of the truncated table, =
because it is a truncate operation, not delete =E2=80=94 that's the main re=
ason why truncate is much faster than delete.
2. Oracle don't want to reconstruct the dirty blocks of the truncated table=
 if the database is crash=C2=A0after the truncate operation =E2=80=94 it is=
 not necessary.

Best Regards=20
Hua Cui (dbsnake)
Mobile: +86-13910122046
Oracle Recovery Last Chance: http://www.oracleodu.com
=C2=A0

--- 11=E5=B9=B47=E6=9C=888=E6=97=A5=EF=BC=8C=E5=91=A8=E4=BA=94, Jonathan Le=
wis <jonathan@jlcomp.demon.co.uk> =E5=86=99=E9=81=93=EF=BC=9A


=E5=8F=91=E4=BB=B6=E4=BA=BA: Jonathan Lewis <jonathan@jlcomp.demon.co.uk>
=E4=B8=BB=E9=A2=98: Re: What is the purpose of segment level checkpoint bef=
ore DROP/TRUNCATE of a table?
=E6=94=B6=E4=BB=B6=E4=BA=BA: "free" <oracle-l@freelists.org>
=E6=97=A5=E6=9C=9F: 2011=E5=B9=B47=E6=9C=888=E6=97=A5,=E5=91=A8=E4=BA=94,=
=E4=B8=8A=E5=8D=883:07



Notes in line

Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com


----- Original Message ----- From: "Saibabu Devabhaktuni" <saibabu_d@yahoo.=
com>
To: "Jonathan Lewis" <jonathan@jlcomp.demon.co.uk>; "free" <oracle-l@freeli=
sts.org>
Sent: Thursday, July 07, 2011 5:07 PM
Subject: Re: What is the purpose of segment level checkpoint before DROP/TR=
UNCATE of a table?


> Hi Jonathan,
>=20
> If we have a table with 100,000 dirty buffers on primary database.
>=20
> 1) Let's say 50,000 dirty buffers were already written to disk by DBWR on
> primary.
> 2) At time T1, truncate table command issued on primary.
> 3) If Oracle had this feature to not write buffers as part of truncate, t=
hen at
> time T2 Oracle finished marking buffers not to write.
> 4) At time T3, truncate operation is fully completed on primary.
> 5) On the standby, configure very small buffer cache size (i.e. can only =
fit
> 10,000 buffers).
> 6) As the redo up to time T1 applied on the standby, only 10,000 dirty bu=
ffers
> can stay in the cache and rest of the blocks will be written to disk as s=
oon as
> redo is applied.
> 7) Redo as of time T2, will not really mark buffers as not to write, as m=
ost of
> them are already on disk. This is when blocks on the standby is going to =
be
> different from the primary as of same checkpoint time.
>=20

So what. I've already pointed out that the standby physical files are alway=
s likely to be different from the primary files at any point in time. Clear=
ly Oracle has to be able to deal with that problem because we can always po=
int out that Oracle is supposed survive a global powercut because of the ba=
sic redo and recovery mechanism.

>=20
> Data loss scenario:
> 1) If there was a system or datafile level checkpoint finished on primary
> between time T2 and T3.
> 2) If primary instance crashes between time T2 and T3, but after the abov=
e
> checkpoint was completed.
> 3) Above checkpoint would have skipped writing buffers marked as not to w=
rite
> and hence on disk image is not current.
> 4) When instance is starting up, crash recovery starts as of redo from th=
e most
> recent checkpoint.
> 5) After the completion of crash recovery, truncate never really finished=
, but
> the data in the dirty blocks as of time T1 is missing.
>=20

Points to worry about - what does it mean to say:
truncate operation is fully completed - what are the events, and in what se=
quence
why do you assume that you "mark the buffers as free" (t2) before you compl=
ete the truncate (t3)
if you have a checkpoint that finishes between t2 and t3 -
=C2=A0 =C2=A0 =C2=A0=C2=A0=C2=A0what does it mean to say that the checkpoin=
t finishes in this context ?
=C2=A0 =C2=A0 =C2=A0=C2=A0=C2=A0when did the checkpoint start, and does tha=
t matter ? (before t1, between t1 and t2, between t2 and t3)
=C2=A0 =C2=A0 =C2=A0=C2=A0=C2=A0where do the local writes come into it in y=
our scenario
=C2=A0 =C2=A0 =C2=A0=C2=A0=C2=A0where do the updates to the data dictionary=
 come in your scenario


How about this for a truncate sequence.

a)=C2=A0 =C2=A0 Grab an exclusive pin on the table definition - this stops =
any more SQL referencing the table from being compiled
=C2=A0 =C2=A0 =C2=A0=C2=A0=C2=A0(may have to wait)
b)=C2=A0 =C2=A0 Grab an exclusive pin on every cursor referencing the table=
 definition - this stops any SQL referencing the table from being executed
=C2=A0 =C2=A0 =C2=A0=C2=A0=C2=A0(may have to wait)
c)=C2=A0 =C2=A0 Invalidate table and index entries in dictionary cache
d)=C2=A0 =C2=A0 Invalidate all cursors referencing the table
e)=C2=A0 =C2=A0 Generate redo for correcting segment header blocks etc.
f)=C2=A0 =C2=A0 Apply, commit, and generate local writes=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 ***
g)=C2=A0 =C2=A0 Generate redo for the data dictionary to show table (and in=
dexes) have been truncated, data_object_id changed.
h)=C2=A0 =C2=A0 apply and commit
i)=C2=A0 =C2=A0 mark as free all buffers for table and indexes and move to =
REPL-AUX
=C2=A0 =C2=A0 =C2=A0=C2=A0=C2=A0have to ignore blocks on write queue that a=
re already pinned by dbwr
j)=C2=A0 =C2=A0 release cursor pins
k)=C2=A0 =C2=A0 release object pin

*** Point (f) needs further thought - Oracle must have a mechanism for avoi=
ding a race condition for blocks which are subject to local writes when a c=
heckpoint is running or you get the option for something similar to your in=
consistency description:
=C2=A0=C2=A0=C2=A0dirty block is not written to file during checkpoint beca=
use it's supposed to be a local write
=C2=A0=C2=A0=C2=A0local write doesn't take place (for some reason, e.g. ses=
sion crashes)
=C2=A0=C2=A0=C2=A0checkpoint completes
=C2=A0=C2=A0=C2=A0if a recovery is required very soon afterwards (and the l=
ocal write still hasn't happened) then
=C2=A0 =C2=A0 =C2=A0=C2=A0=C2=A0the block on disc is wrong
=C2=A0 =C2=A0 =C2=A0=C2=A0=C2=A0the recovery process is going to start from=
 the next redo log, and therefore not see the redo that should bring the bl=
ock up to date.


--
http://www.freelists.org/webpage/oracle-l



--0-1382313819-1310092779=:71099
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

<table cellspacing=3D"0" cellpadding=3D"0" border=3D"0" ><tr><td valign=3D"=
top" style=3D"font: inherit;"><DIV>I think there are two reasons:</DIV>
<DIV>1. Oracle don't want to touch the data&nbsp;blocks of the truncated ta=
ble, because it is a truncate operation, not delete =E2=80=94 that's the ma=
in reason why truncate is much faster than delete.</DIV>
<DIV>2. Oracle don't want to reconstruct the dirty blocks of the truncated =
table if the database is crash&nbsp;after the truncate operation =E2=80=94 =
it is not necessary.<BR></DIV>
<DIV>Best Regards <BR>Hua Cui (dbsnake)<BR>Mobile: +86-13910122046<BR>Oracl=
e Recovery Last Chance: <A href=3D"http://www.oracleodu.com/" rel=3Dnofollo=
w target=3D_blank>http://www.oracleodu.com</A></DIV>
<DIV>&nbsp;</DIV><BR><BR>--- <B>11=E5=B9=B47=E6=9C=888=E6=97=A5=EF=BC=8C=E5=
=91=A8=E4=BA=94, Jonathan Lewis <I>&lt;jonathan@jlcomp.demon.co.uk&gt;</I><=
/B> =E5=86=99=E9=81=93=EF=BC=9A<BR>
<BLOCKQUOTE style=3D"BORDER-LEFT: rgb(16,16,255) 2px solid; PADDING-LEFT: 5=
px; MARGIN-LEFT: 5px"><BR>=E5=8F=91=E4=BB=B6=E4=BA=BA: Jonathan Lewis &lt;j=
onathan@jlcomp.demon.co.uk&gt;<BR>=E4=B8=BB=E9=A2=98: Re: What is the purpo=
se of segment level checkpoint before DROP/TRUNCATE of a table?<BR>=E6=94=
=B6=E4=BB=B6=E4=BA=BA: "free" &lt;oracle-l@freelists.org&gt;<BR>=E6=97=A5=
=E6=9C=9F: 2011=E5=B9=B47=E6=9C=888=E6=97=A5,=E5=91=A8=E4=BA=94,=E4=B8=8A=
=E5=8D=883:07<BR><BR>
<DIV class=3DplainMail><BR>Notes in line<BR><BR>Regards<BR><BR>Jonathan Lew=
is<BR><A href=3D"http://jonathanlewis.wordpress.com/" target=3D_blank>http:=
//jonathanlewis.wordpress.com</A><BR><BR><BR>----- Original Message ----- F=
rom: "Saibabu Devabhaktuni" &lt;<A href=3D"http://cn.mc158.mail.yahoo.com/m=
c/compose?to=3Dsaibabu_d@yahoo.com" ymailto=3D"mailto:saibabu_d@yahoo.com">=
saibabu_d@yahoo.com</A>&gt;<BR>To: "Jonathan Lewis" &lt;<A href=3D"http://c=
n.mc158.mail.yahoo.com/mc/compose?to=3Djonathan@jlcomp.demon.co.uk" ymailto=
=3D"mailto:jonathan@jlcomp.demon.co.uk">jonathan@jlcomp.demon.co.uk</A>&gt;=
; "free" &lt;<A href=3D"http://cn.mc158.mail.yahoo.com/mc/compose?to=3Dorac=
le-l@freelists.org" ymailto=3D"mailto:oracle-l@freelists.org">oracle-l@free=
lists.org</A>&gt;<BR>Sent: Thursday, July 07, 2011 5:07 PM<BR>Subject: Re: =
What is the purpose of segment level checkpoint before DROP/TRUNCATE of a t=
able?<BR><BR><BR>&gt; Hi Jonathan,<BR>&gt; <BR>&gt; If we have a table with=
 100,000 dirty
 buffers on primary database.<BR>&gt; <BR>&gt; 1) Let's say 50,000 dirty bu=
ffers were already written to disk by DBWR on<BR>&gt; primary.<BR>&gt; 2) A=
t time T1, truncate table command issued on primary.<BR>&gt; 3) If Oracle h=
ad this feature to not write buffers as part of truncate, then at<BR>&gt; t=
ime T2 Oracle finished marking buffers not to write.<BR>&gt; 4) At time T3,=
 truncate operation is fully completed on primary.<BR>&gt; 5) On the standb=
y, configure very small buffer cache size (i.e. can only fit<BR>&gt; 10,000=
 buffers).<BR>&gt; 6) As the redo up to time T1 applied on the standby, onl=
y 10,000 dirty buffers<BR>&gt; can stay in the cache and rest of the blocks=
 will be written to disk as soon as<BR>&gt; redo is applied.<BR>&gt; 7) Red=
o as of time T2, will not really mark buffers as not to write, as most of<B=
R>&gt; them are already on disk. This is when blocks on the standby is goin=
g to be<BR>&gt; different from the primary as of same checkpoint
 time.<BR>&gt; <BR><BR>So what. I've already pointed out that the standby p=
hysical files are always likely to be different from the primary files at a=
ny point in time. Clearly Oracle has to be able to deal with that problem b=
ecause we can always point out that Oracle is supposed survive a global pow=
ercut because of the basic redo and recovery mechanism.<BR><BR>&gt; <BR>&gt=
; Data loss scenario:<BR>&gt; 1) If there was a system or datafile level ch=
eckpoint finished on primary<BR>&gt; between time T2 and T3.<BR>&gt; 2) If =
primary instance crashes between time T2 and T3, but after the above<BR>&gt=
; checkpoint was completed.<BR>&gt; 3) Above checkpoint would have skipped =
writing buffers marked as not to write<BR>&gt; and hence on disk image is n=
ot current.<BR>&gt; 4) When instance is starting up, crash recovery starts =
as of redo from the most<BR>&gt; recent checkpoint.<BR>&gt; 5) After the co=
mpletion of crash recovery, truncate never really finished,
 but<BR>&gt; the data in the dirty blocks as of time T1 is missing.<BR>&gt;=
 <BR><BR>Points to worry about - what does it mean to say:<BR>truncate oper=
ation is fully completed - what are the events, and in what sequence<BR>why=
 do you assume that you "mark the buffers as free" (t2) before you complete=
 the truncate (t3)<BR>if you have a checkpoint that finishes between t2 and=
 t3 -<BR>&nbsp; &nbsp; &nbsp;&nbsp;&nbsp;what does it mean to say that the =
checkpoint finishes in this context ?<BR>&nbsp; &nbsp; &nbsp;&nbsp;&nbsp;wh=
en did the checkpoint start, and does that matter ? (before t1, between t1 =
and t2, between t2 and t3)<BR>&nbsp; &nbsp; &nbsp;&nbsp;&nbsp;where do the =
local writes come into it in your scenario<BR>&nbsp; &nbsp; &nbsp;&nbsp;&nb=
sp;where do the updates to the data dictionary come in your scenario<BR><BR=
><BR>How about this for a truncate sequence.<BR><BR>a)&nbsp; &nbsp; Grab an=
 exclusive pin on the table definition - this stops any more SQL
 referencing the table from being compiled<BR>&nbsp; &nbsp; &nbsp;&nbsp;&nb=
sp;(may have to wait)<BR>b)&nbsp; &nbsp; Grab an exclusive pin on every cur=
sor referencing the table definition - this stops any SQL referencing the t=
able from being executed<BR>&nbsp; &nbsp; &nbsp;&nbsp;&nbsp;(may have to wa=
it)<BR>c)&nbsp; &nbsp; Invalidate table and index entries in dictionary cac=
he<BR>d)&nbsp; &nbsp; Invalidate all cursors referencing the table<BR>e)&nb=
sp; &nbsp; Generate redo for correcting segment header blocks etc.<BR>f)&nb=
sp; &nbsp; Apply, commit, and generate local writes&nbsp; &nbsp; &nbsp; &nb=
sp; ***<BR>g)&nbsp; &nbsp; Generate redo for the data dictionary to show ta=
ble (and indexes) have been truncated, data_object_id changed.<BR>h)&nbsp; =
&nbsp; apply and commit<BR>i)&nbsp; &nbsp; mark as free all buffers for tab=
le and indexes and move to REPL-AUX<BR>&nbsp; &nbsp; &nbsp;&nbsp;&nbsp;have=
 to ignore blocks on write queue that are already pinned by
 dbwr<BR>j)&nbsp; &nbsp; release cursor pins<BR>k)&nbsp; &nbsp; release obj=
ect pin<BR><BR>*** Point (f) needs further thought - Oracle must have a mec=
hanism for avoiding a race condition for blocks which are subject to local =
writes when a checkpoint is running or you get the option for something sim=
ilar to your inconsistency description:<BR>&nbsp;&nbsp;&nbsp;dirty block is=
 not written to file during checkpoint because it's supposed to be a local =
write<BR>&nbsp;&nbsp;&nbsp;local write doesn't take place (for some reason,=
 e.g. session crashes)<BR>&nbsp;&nbsp;&nbsp;checkpoint completes<BR>&nbsp;&=
nbsp;&nbsp;if a recovery is required very soon afterwards (and the local wr=
ite still hasn't happened) then<BR>&nbsp; &nbsp; &nbsp;&nbsp;&nbsp;the bloc=
k on disc is wrong<BR>&nbsp; &nbsp; &nbsp;&nbsp;&nbsp;the recovery process =
is going to start from the next redo log, and therefore not see the redo th=
at should bring the block up to date.<BR><BR><BR>--<BR><A
 href=3D"http://www.freelists.org/webpage/oracle-l" target=3D_blank>http://=
www.freelists.org/webpage/oracle-l</A><BR><BR><BR></DIV></BLOCKQUOTE></td><=
/tr></table>
--0-1382313819-1310092779=:71099--

--
http://www.freelists.org/webpage/oracle-l


