CPU hogged with endless "FLUSH SLAVE FAILED, AWR ENQUEUE TIMEOUT" messages in alert log

From: Thomas Kellerer <thomas.kellerer_at_mgm-tp.com>
Date: Mon, 26 Nov 2012 08:56:47 +0100
Message-ID: <50B320BF.7080700_at_mgm-tp.com>


an Oracle 2-node RAC system (Linux) went haywire last week. The CPUs of both nodes went to 100% and the message

  KEWRAFC: FLUSH SLAVE FAILED, AWR ENQUEUE TIMEOUT was written to the alert log about every minute.

On Metalink we found two documents describing this problem (Bug 6851176 and Documet ID 555124.1), but none of the suggested actions did make a difference. Apart from the CPU the system is idle. I/O is normal and no swapping is going on.

One major change that we did on this system was to increase the SGA from 2GB to 12GB (because recently other Oracle instances were moved from the system to a different system and thus we could use the memory). Both nodes have 16GB of physical memory.

The client has opened a ticket with Oracle support but given version number we don't expect a reaction soon.

We also tried deleting the saved AWR snapshots and increase the snapshot time from 1 hour to 4 hour to reduce the amount of information that is written to the AWR tables. Nothing showed a difference.

Can anybody think of other possible workarounds or even a solution to this?

Thanks in advance.

Thomas Kellerer

Received on Mon Nov 26 2012 - 08:56:47 CET

Original text of this message