Re: Data Block Cleanup?? plus Row Locking and stuff (long)

From: Graeme Sargent <graeme_at_pyra.co.uk>
Date: Wed, 13 Oct 1993 18:41:47 GMT
Message-ID: <1993Oct13.184147.10642_at_pyra.co.uk>


In <CEL3u3.J5x_at_world.std.com> edwards_at_world.std.com (Jonathan Edwards) writes:

>Let me get this straight.

I don't think you have, but I'm not sure if I have either so I'm going to punt what I think happens to see what reaction I get.

>I create records in a table,

	  ^^^^^^^
	  rows in a table, please.


> one transaction per record.

I don't understand this concept. If you mean one row per transaction then I would not expect to see block cleanup occuring.

>The older records get purged to disk eventually, long after they were commited.

Not necessarily in the general case, but presumably true in a one row per transaction scenario (although the out-to-lunch problem could provide the odd exception of course).

>Later, I do a read-only access to these old records.

Yesss...

>These blocks get written out to disk again to update internal bookeeping
>info.

As I understand it, only those that were flushed to disk before they were committed.

>Does this still happen if I declare sufficient "transaction entries" in each
>block?

Yes. Although it is (presumably) the transaction table that needs "cleaning up" the size of the table is not relevant to that requirement. I believe that the row headers also require "cleaning up". I have always found it difficult to get Oracle to be categoric and specific in this area. In my experience, even the Architecture & Internals instructors get confused when pressed. I've never found the time (or reason) to try to take it further or to build a test harness.

> It appears that I need to allocate one entry per record in order to
>get full row-level concurrency in Oracle anyway.

Yes and no and not exactly.

Are you talking about INITRANS or MAXTRANS?

I don't think I understand this perfectly, so I look forward to corrections:

INITRANS of 1 *can* give you full row-level concurrency. This parameter defines the initial number of transaction entries that are created when a new data block is allocated. The default value (for a table) is 1 and there is a new "suggestion" in the O7 doc that you don't change it.

However, the only way to *absolutely* guarantee that every row in a block could *always* be concurrently updated by different users would be to set this to the maximum number of rows that a block might hold (if you could predict how many that might be, which is pretty difficult as the value of INITRANS affects how many might be able to fit).

MAXTRANS defines the maximum size to which the Interested Transaction List (ITL) can grow, space permitting. In O7 there is an unspecified blocksize dependant default (rather than the V6 default of 255) and a new "recommendation" that you don't change it.

I do not know whether the ITL, having grown larger than INITRANS, ever shrinks again, or whether such compaction might be part of the data block cleanup process. It has been implied/stated to me in the past that it does not shrink back, whereas if that were the case, I do not understand the concern that was expressed at V6 regarding the effect of INITRANS on the overhead of dynamically growing the ITL, as that would be a much rarer event, unless it also dynamically shrinks back again.

On the other hand, if the ITL does shrink back, then I do not understand why the INITRANS concern has apparently gone away in O7.

The other issues in this domain which I do not understand can be illustrated as follows:

Let us postulate an SGA with 8000 DB_BLOCK_BUFFERS, DB_BLOCK_WRITE_BATCH and DB_FILE_MULTIBLOCK_READ_COUNT values of 32 and a table foo with 256,000 rows evenly distributed across 16,000 data blocks.

Let us further postulate a user1 process that issues:

    BEGIN;
    SELECT * FROM foo WHERE keycol IS NOT NULL FOR UPDATE OF banana;

  • although it is not likely that a real application would use a
  • where_condition which selects all rows, as this one should, it is
  • quite feasible to select many thousands, be it deliberate (bad
  • practice - use a table lock) or accidental.

At this point Oracle needs to "immediately" place locks on all 256,000 rows, therefore it will start reading 32 blocks at a time into 32 data buffers as specified by DB_FILE_MULTIBLOCK_READ_COUNT.

It is not 100% clear to me where these buffers are. I was once told that they were in the PGA, but it must surely use 32 of the 8,000 db_block_buffers so as to be recorded in the hash table and thereby prevent the possibility of duplicate buffers in different areas of memory. On the other hand there is surely a performance imperative to acquire 32 *contiguous* buffers for the read, and I know of no mechanism to allow that for SGA buffers.

What I suspect must *really* happen is that it will set up 32 contiguous buffers in the PGA as a secondary cache, and then scan the table into the SGA one block at a time using the normal buffer acquisition algorithm(*), initiating 32 block reads into the PGA whenever it suffers a double cache miss.

    (*) An analysis/exposition of the buffer acquisition algorithm might

	well double the length of this paper, and must, I fear, be left
	for a later publication.

Having read a block into the buffer pool it needs to lock all 16 rows, which it presumably does by creating an ITL entry and writing the slot number (1) to the second byte of each row header.

Why do I say it must use slot number 1? Because for this particular query (all rows from table) no other slot will do:

    Let us suppose there is already an entry in slot 1 of the ITL.     User1 must first establish whether it belongs to an active     transaction or not. I do not know exactly how it does this, but I     assume there must be a transaction identifier in the ITL entry, and     a list of active transactions somewhere in the SGA against which it     can be compared.

    If this logic is correct, then there is an interesting side effect     for the Parallel Server Option here. Whenever a block is     transferred from node to node around the cluster ("pinged"), the     active transaction information must travel with it, and thus the     Distributed Lock Manager must support that capability. This would     appear to be true for disk to SGA migration, as well as SGA to SGA     migration.

    Anyway, if a match is found, then somewhere in the block there will     be a locked row and user1's transaction is doomed to failure on     encountering the conflict (assuming that there are no NULLS in     keycol and that user1 therefore needs to lock every row). If not,     then user1 can assume that the transaction, and therefore the lock,     is no longer active. It is safe to usurp slot1 in the ITL and scan     all the row_headers setting the ITL slot to 0.

    It is not clear to me whether Oracle have chosen to merge the     "release_lock" scan with the "set_lock" scan or not. There is a     basic choice in latching strategy here. They could opt to latch the     buffer for the duration of the block activity, which I suspect could     potentially become an undesirably long latch duration; or they could     elect to latch at the level of manipulating an individual row (or     other sub-block-level object). At this stage, I do not know what     latching strategy Oracle uses.

I believe it is a common misconception that whilst doing this kind of full table scan Oracle would read into all/many of the 8,000 db_block_ buffers available to it. This is, of course, not true, as that would flood the buffer cache, telescope the LRU queues and knock the buffer cache hit ratio right out of the ballpark.

To avoid this I believe that instead of adding the buffers to the MRU end of the LRU list, they are added to the LRU end instead, which results in a tendency for the scan of foo to reuse the same buffers, thereby limiting its impact on the buffer cache efficiency. In addition, with DB_BLOCK_WRITE_BATCH set to 32, DBWR will have been signalled by the time the secondary cache is half exhausted, and the locked rows will be shuffled straight back out to disk again.

As you can see this results in the locked (but as yet unmodified) blocks being written out to disk long before COMMIT time. I believe it is another common misconception that this can't/doesn't happen.

Let us suppose that this continues for another 498 32 block cycles (loosely speaking, that is. Buffer cache hits can be expected to make this non-exact).

Let us *now* postulate, that sometime early on during this process there was a user2 process which issued:

    BEGIN;
    UPDATE foo SET keycol = NULL WHERE keycol = '<last_in_table>';

  • Naughty, Naughty!
  • But let's assume that someone forgot to specify NOT NULL on keycol.

and then went out-to-lunch!

Unless I'm very much mistaken, that's three (at least) database architect nightmares in a row!

Presumably, by the time user1 gets round to his last i/o cycle, user2 is already sitting down to dine, having left his muddy pawprint in slot 1 of the ITL in the 16,000th block, a before-image in a rollback segment block buffer in the SGA and a "1" in the second byte of the sixteenth row.

Block 16000 will (I believe) still be in the SGA, as DBWR has been far too busy with user1 to have got round to it yet (after all, not many seconds should have passed yet, ahem :-) ). So user1 will continue merrily through the last few blocks until, on finding Block 16000 in the cache it discovers for the first time (by coincidence) that slot 1 of the ITL is occupied.

and so for the
first time (by definition) it occupies slot 2 (expanding the ITL if necessary) and writes a "2" to the second byte of the first 15 rows in Block 16000. It looks at the sixteenth and final row, and ...

My questions are (at last):

    Am I right so far?

and

    What happens next?

plus some subsidiary questions for the *real* experts:

    What would happen if Block 16000 was full and there wasn't room to     create a second ITL entry?

    What happens if keycol in row 256000 is already NULL?

graeme

--
Disclaimer:	The author's opinions are his own, and not necessarily
		those of Pyramid Technology Ltd. or Pyramid Technology Inc.
---------------------------------------------------------------------------
      -m------- Graeme Sargent                 Voice: +44 (0)252 373035
    ---mmm----- Senior Database Consultant     Fax  : +44 (0)252 373135
  -----mmmmm--- Pyramid Technology Ltd.        Telex: Tell who???
-------mmmmmmm- Farnborough, Hants  GU14 7PL   Email: graeme_at_pyra.co.uk
---------------------------------------------------------------------------
    We have the technology.  The tricky bit is learning how to use it.
Received on Wed Oct 13 1993 - 19:41:47 CET

Original text of this message