Home » Server Options » RAC & Failsafe » GCS Log Flush Sync - don't understand (11,12, Linux)
GCS Log Flush Sync - don't understand [message #645216] Mon, 30 November 2015 04:57 Go to next message
piotrtal
Messages: 168
Registered: June 2011
Location: Poland/Czestochowa
Senior Member

Hello Guru's

Could you explain me this (im not getting why LMS will request LGWR flush when transfering CR block between instances) - why?

----------
In RAC, since multiple buffer caches are fused, a mechanism similar to log file sync is employed by the LMS
process to maintain durability. LMS will request a log flush before sending the block if undo records were applied to
create a CR copy of the block or if the block has uncommitted recent transactions in the block. LMS process will wait
for LGWR to complete the log flush and post LMS process to continue.
---------

I suppose thats is connected with possiblility that one instance crashes - but I can't find explanation of that. Instance recovery (which is done by another one) can be guarantee without performing LGWR flush when CR block is transfered between instances.
Could you give me example please? Thanks in advance.

Piotr
Re: GCS Log Flush Sync - don't understand [message #645218 is a reply to message #645216] Mon, 30 November 2015 05:05 Go to previous messageGo to next message
John Watson
Messages: 7148
Registered: January 2010
Location: Global Village
Senior Member
Quote:
Instance recovery (which is done by another one) can be guarantee without performing LGWR flush when CR block is transfered between instances.
No! Instance recovery is impossible if the log buffer has not been written. Think about it: instance recovery means reading blocks into cache and applying change vectors. Where would those vectors come from?

There is a very nice description recorded here,
http://www.skillbuilders.com/webinars/webinar.cfm?id=124&w=How-Oracle-RAC-Cache-Fusion-Works
(you might want to buy his book, too)
Re: GCS Log Flush Sync - don't understand [message #645219 is a reply to message #645218] Mon, 30 November 2015 05:19 Go to previous messageGo to next message
piotrtal
Messages: 168
Registered: June 2011
Location: Poland/Czestochowa
Senior Member

John,

But Im thinking about specific case (when transaction is not commited).
Lets have such situation:
Node1 and Node2. Node2 is doing update (no commit performed). Node1 is requesting block for update (updated of different row in this block). When Node2 will send block to Node1 its is performing LGWR flush <-- why during this step LMS requests LGWR flush for Node2 (this transaction is not commited and even if dirty buffers were flushed into datafiles (but still not commited), and even if node2 will fail it will be rollbacked by Node1).

... and this rollback is possible to do even without performing Node2 LGWR flush before sending block to Node1.
Im not getting only this particular case.

I don't know if Im clear with my concerns.

Piotr
Re: GCS Log Flush Sync - don't understand [message #645220 is a reply to message #645219] Mon, 30 November 2015 05:21 Go to previous messageGo to next message
John Watson
Messages: 7148
Registered: January 2010
Location: Global Village
Senior Member
What about recovering changes made to the undo block? Please do not reply until you have watched Brian's video. At least twice.
Re: GCS Log Flush Sync - don't understand [message #645227 is a reply to message #645216] Mon, 30 November 2015 14:24 Go to previous messageGo to next message
bpeasland
Messages: 51
Registered: February 2015
Location: United States
Member

This is deep into the weeds of RAC internals, which is always good, but not to the level that most people tend to get. I say this because the actions you describe are part of background processing, not foreground (sessions).

The snippet you posted about LMS and LGWR's behavior is cut and paste from Expert Oracle RAC 12c by Shamsudeen, Hussain, Yu, and Farooq on Apress. Next time you cut and paste from some copyrighted material, it would be best to cite it as such. For those interested, this section appears in Chapter 10 of that book.

Before I get to the answer, you might want to go watch the Cache Fusion video that John posted a link to. That video is more of a high-level of what happens in Cache Fusion transfers. The video is more from the perspective of how things are working from the perspective of an end-user session. There is (obviously) lots more that goes on behind the scenes as the video series does not touch on GES, LMS, etc.

Now on to answer your question, or least I hope to answer it....with a simple example...

Let's say that UserA is connected to ORCL1 (instance 1 of db named ORCL). UserA makes a change to a row in a table's block. The change is not yet committed. Next, UserB is connected to ORCL2, a different instance for the RAC database, and is requesting the same block. So Cache Fusion kicks in and a Consistent Read image of the block is sent across the Cluster Interconnect from ORCL1 to ORCL2.

Before Oracle can send that CR image of the block from ORCL1 to ORCL2, Log Writer must ensure that the redo records are written to online redo logs belonging to thread 1. LGWR will send a signal (post) to LMS that the redo has been written and that the CR block can be sent. Note that a majority of the time in a majority of the RAC deployments, the redo has already been written by the time the CR block is requested, so this action may have already taken place.

So why does this need to occur? As stated in that book you are citing, this is for Durability of the ACID properties. Let's assume that UserA has modified the block but not yet committed. UserB gets a CR version of the block. That CR version is now in the Buffer Cache of ORCL2. Before UserA does the commit and just after UserB is using the CR version of the block, ORCL1 abnormally terminates. The change occurred in ORCL1 and ORCL2 has a CR block based on the fact that the block was changed. Yet ORCL1 crashed before the commit was complete. How does Oracle reconcile a CR block in ORCL2 when all of the redo (and undo) that created that CR block is no longer available? To make it easy and to have the complete information for thread recovery of the failed instance, Oracle insists that the redo is written to the ORLs before the CR block can be transferred to another instance. With the redo in the ORL, and any instance has access to any other thread's ORLs, Oracle can now exactly what was needed to create that CR image of the block.

You can see occurrences of LMS waiting for LGWR to do its work by the presence of 'gcs log flush sync' wait events. But those events are only seen from the background process LMSx. From the end user's perspective, these will be seen as 'gc buffer busy' wait events. There are other reasons for 'gc buffer busy' wait events so not all 'gc buffer busy' wait events imply 'gcs log flush sync' wait events.

I hope that helps answer your questions.



Cheers,
Brian
Re: GCS Log Flush Sync - don't understand [message #645246 is a reply to message #645227] Tue, 01 December 2015 06:13 Go to previous messageGo to next message
piotrtal
Messages: 168
Registered: June 2011
Location: Poland/Czestochowa
Senior Member

thank you Brian. I really appreciate your valuable input.

yes I quoted lines from the book which you mentioned - sorry for not informing before.


.. but still not geting why LMS initiates LGWR flush when transfering CR block between instances and why this needs to be performed to guarantee consistency.

Let me prove my theory that LGWR flush is not neccesary in that case, and accorting to my (not proved) theory lets imagine that there is no LGWR flush when transfering block between nodes.

We are at the moment when:

- transaction in ORCL2 was not commited
- CR block is transfered to ORCL1 (ORCL1 contains block with content from before update - because UNDO was applied to the block on ORCL2 before transfering to ORCL1)
- ORCL2 crashed



first scenario (ORCL2 dirty blocks was not flushed and we have no DML recorded in in redo):
################################################

ORCL1 is doing recovery.
Theere is no need to recover this block on datafile because it was not changed in datafile. Uncommited transaction is gone and not relevant. ORCL1 contains block with content from before not commited update on ORCL2 - so its also consistent.


second scenario (ORCL2 dirty blocks was flushed and DML is recored in redo):
#################################################

this situation is clear - block on disk will be recovered during instance recovery (we have all deltas in redo).
ORCL1 has still good version of block (from before update)


third scenario (ORCL2 dirty blocks was not flushed and DML is recoreded in redo):
#################################################

ORCL1 will roll forward this block during instance recovery and than rollback uncommited transaction - datafiles are consistent.
ORCL1 has still good version of block from before not commited update on ORCL2.





so where is here place for LGWR flush (initiated by LMS) since for me everything will (probably) work without it?

Piotr

[Updated on: Tue, 01 December 2015 06:14]

Report message to a moderator

Re: GCS Log Flush Sync - don't understand [message #645252 is a reply to message #645246] Tue, 01 December 2015 10:04 Go to previous messageGo to next message
bpeasland
Messages: 51
Registered: February 2015
Location: United States
Member

You offered three different scenarios, all worthy of consideration, and all very good discussion points. But let me relate a piece of information one of my old college professors taught me about database systems, and has served me well in my career. Database management systems like to do as little work as possible. This keeps them efficient. This is why a mass delete on a table doesn't update rearrange rows and move the HWM. This is why that mass delete doesn't rebuild the index. Those take time and are inefficient. Now sometimes, the database system doing as little work as possible leads to issues down the line. A table may need reorganization from time to time, etc.

So back to your three scenarios. Each of those take time to figure out. Why spend that time? Wouldn't it be more efficient to just write the redo to the ORL no matter what is going on? Just write the redo and get on with satisfying the global cache transfer.

Also remember that preserving the transaction is of utmost importance! Its better to be safe than sorry when it comes to safeguarding the transaction. I'm guessing that someone in Oracle that wrote the code decided at one point in time that it was more prudent to dump the redo to the ORL than risk losing the redo.

Another thing to consider is that an instance is holding a CR version of a block. It has no idea how this CR version was created without that redo and its associated undo. Now Oracle can simply invalidated that CR block should the other instance terminate mid-transaction. But then it has to go through and figure out which CR blocks in the Buffer Cache are in need of instance recovery and thus need to be invalidated. Or would it be much simpler if it had all of the information it needed on shared storage, just in case?




HTH,
Brian
Re: GCS Log Flush Sync - don't understand [message #645332 is a reply to message #645252] Thu, 03 December 2015 07:15 Go to previous message
piotrtal
Messages: 168
Registered: June 2011
Location: Poland/Czestochowa
Senior Member

Ist still confusing for me but...
Thank you Brian for your help.

[Updated on: Thu, 03 December 2015 07:22]

Report message to a moderator

Previous Topic: rac setup using vmware
Next Topic: PastImage concep
Goto Forum:
  


Current Time: Mon Nov 20 18:53:11 CST 2017

Total time taken to generate the page: 0.01660 seconds