Re: What is lock type 'KO' in v$lock table?

From: Mladen Gogala <mgogala.spam-me-not_at_verizon.net>
Date: Sun, 24 Sep 2006 05:38:59 GMT
Message-Id: <pan.2006.09.24.05.38.42.467739@verizon.net>

On Sat, 23 Sep 2006 19:45:18 -0700, yong321 wrote:

> Jonathan Lewis briefly talked about segment checkpoint on p.30 of his
> "Cost Based Oracle". I was also looking for more information about it.
> The white paper you cited tells us that before 10gR2, it's actually
> implemented as a tablespace checkpoint. I was wondering how the
> checkpoint queue could have a more granular "sub-queue" or something.
>
> Object or segment checkpoint sounds like a good idea not just for direct
> path read, but I think also for truncate or drop table / drop index
> (they shouldn't happen on production databases though). If the feature
> is turned off, every direct path read probably incurs a whole tablespace
> checkpoint.
>
> Yong Huang

Why would that be? Other then that, I lived without object checkpoints from the version 4 to version 10.1 and things were fine. I don't see why would turning those new checkpoints off be such a catastrophe. Checkpoint occurs when DB buffers are written to the data files. Let's see what the documentation says:

Administrators Guide:
Checkpoint (CKPT) At specific times, all modified database buffers in the system global area are written to the datafiles by DBWn. This event is called a checkpoint. The checkpoint process is responsible for signalling DBWn at checkpoints and updating all the datafiles and control files of the database to indicate the most recent checkpoint.

and:

Concepts:
The DBWn process writes dirty buffers to disk under the following conditions:

      When a server process cannot find a clean reusable buffer after
      scanning a threshold number of buffers, it signals DBWn to write.
      DBWn writes dirty buffers to disk asynchronously while performing
      other processing.

      DBWn periodically writes buffers to advance the checkpoint, which is
      the position in the redo thread (log) from which instance recovery
      begins. This log position is determined by the oldest dirty buffer
      in the buffer cache.

In all cases, DBWn performs batched (multiblock) writes to improve efficiency. The number of blocks written in a multiblock write varies by operating system.

Also, from the Note:265831.1 one would conclude that checkpoints are global events which occur:

During the log switch
At predefined intervals (archive_lag_target)
When explicitly invoked by "alter system" command.

During the checkpoint, DBWR is instructed to write DB blocks back to the database files and, during that time, CKPT process updates the data file headers and control file.

Why would a direct read trigger a checkpoint? My understanding is that checkpoint is a global event that is triggered when certain critical conditions are met to provide the system with the new baseline and to free some DB buffers. So called "object checkpoint" is an antithesis of a global event because it's localized to a segment. So, instead of a single, globally controlled event, we can now have an army of checkpoints, occurring for every object being modified. Objects that qualify for triggering a checkpoint would be tables, materialized views, partitions, indexes and clusters. There are several questions here: 1) when checkpoints for multiple objects gang up, which process writes

the blocks down? Can the writers keep up with the new "local checkpoints"? What about piggybacking? Is there "checkpoint escalation" mechanism which would re-request a global checkpoint, if the local one is big? How about RAC? Are object checkpoints global in nature? How do different instances react? Global lock? Has anybody thought of the artificial deadlocks that can occur in RAC configurations if multiple CKPT processes start waiting for each other to finish, so that they can enqueue their own local checkpoints?
2) Apparently, there is a locking problem. CKPT locks the segment in the

"KO" mode and processes that are trying to update the database are unable to proceed. What happens if I have a large SGA that takes a long time to write? Will my transactions be blocked every now and then and will users be made to wait for the "local checkpoint" to finish? That is the behavior that prompted me to start looking into the problem. As a company which experiences sudden bursts of update activity, I am worried that this will prevent other users from updating the segment for a long time. What locks is "KO" mode compatible with? Does it block readers? 3) What triggers the "object checkpoint"? With all due respect, I don't

believe that checkpoints are triggered by a direct read. Checkpoints used to be necessary, high intensity events which needed to be performed as infrequently as possible to preserve performance and as frequently as necessary to ensure timely recovery. Checkpoint as term was introduced with Oracle6. Until Oracle10.2 they were also well documented. Unfortunately, some @#$% #### at Oracle Corp. did an artificial benchmark, found the new mechanism to be faster and made a fundamental change, without telling anybody.

Now, checkpoints were an essential ingredient of the database, something that every DBA had to think about every now and then. Tinkering with that mechanism can produce a disaster and should not be done without first documenting it and then providing an off switch which would enable users to return to the previous behavior if this one was not well understood or creates any problems. Yet, none of those two very common sense actions were taken by Oracle Corp. It was just skipped and shoved down our throats in a very arrogant manner that makes me very, very angry. If PostgresSQL ever becomes a serious alternative, I will definitely have another reason for endorsing them.
PS:

---
I really hope that somebody from the Oracle Corp. is reading this group.
This is the support that they charge dearly for:

13-SEP-06 17:51:02 GMT

New info : mgogala_at_yahoo.com : I knew that I have to rebuild the database. My
question was more to
find out what can I do to prevent this from happening the next time.


14-SEP-06 05:21:22 GMT

.
UPDATE
=======

o Haven't seen any case/bug report like this issue
o It is not able to determine what is the cause of this issue, but it looks that someone co
py the 3_6304.dbf on top of 3_6305.dbf

o As we cannot determine what is the cause of the problem, cannot recommend anything to prevent the issue.


SR hard closed. I lost a standby database because the archiver suddenly
produced two equal archives with different sequences. I asked for help
and was told something like: sorry, we have no clue. Please, let us know
if this happens again. Not only "it" was not able to give me any useful
information, "it" was dragging things on for so long that my database was
already rebuilt when I requested escalation. Of course, nothing was done.
Some similar kind of cousin it is probably also responsible for such a
brilliant way of informing us about the new checkpoint mechanism. 

-- 
http://www.mladen-gogala.com

Received on Sun Sep 24 2006 - 00:38:59 CDT