Re: High response time with top 'Other' and 'application' wait class

From: Tanel Poder <tanel_at_tanelpoder.com>
Date: Tue, 29 Nov 2022 00:17:57 -0500
Message-ID: <CAMHX9JJ0Es+1puSKuteF-kS-=MVKfJsqVp_ENM0bLead6fQLLA_at_mail.gmail.com>

If this query is so frequently executed, why not have an index on it too, so there are no frequent full table scans at all? (The E-Rows in the execution plan show that you're "looking for a needle in the haystack", just 1-2 rows per visit, so an index would help).

How big is that table? Since the "serial direct path read" decision is dynamic, depending on the scanned table/partition size and your buffer cache size, it is possible that previously the table segment was small enough so all full table scanning was done via buffer cache). Once it grew beyond a threshold, Oracle automatically switched to direct path reads and now is overloading the CKPT process (and DBWR/IO) with all the (frequent and possibly concurrent) segment checkpoint requests...

Options:

Use an index to avoid full table scan - no FTS means no segment level checkpoints & enq: KO waits
Use "_serial_direct_read"=never for these sessions to avoid direct path reads for this query
Use other tricks like mark the table CACHE (and/or set up KEEP pool correctly) or the InMemory option
Even if you're hitting some bug that's causing CKPT to become less efficient when doing hundreds of full table scans (and segment checkpoints) per second, this "FTS in a loop for finding 1-2 rows" approach isn't optimal.

The sudden slowdown without a plan change is more likely explained by Oracle switching FTS from cached mode to direct mode at some point. You can verify with ASH or asqlmon, see what wait events did you see earlier when things were fast (cell multiblock physical read without enq: KO & reliable message waits?)

--
Tanel Poder
https://learn.tanelpoder.com

On Mon, Nov 28, 2022 at 9:55 PM Lok P <loknath.73_at_gmail.com> wrote:



> Hello ,


> It's version 19C of the oracle database. We got complaints from the

> application team of sudden slowness and thus app traffic automatically

> redirected to another active DB replica.While looking into the DB side, we

> do observe high wait events from wait class 'Other' and 'Application' in

> OEM. It lasted for exactly 2-3 minutes as the application team is

> complaining about. And then fetching the "ash wait chain" and "ash top" ,

> it's pointing to significantly high wait event like "reliable message"

> (~75%), "enq: KO - fast object checkpoint","gcs drm freeze in enter server

> mode" from same wait class 'other' and 'application'.

>

> Below is the ASH top and ASH wait chain from the issue duration of

> ~5minutes interval. And the select query which is coming on top. And table

> CB is holding just ~41 rows in it.

> https://gist.github.com/oraclelearner/44394ab8206fc7bd51041eb3d45bdf9f

>

> And also the top query which is showing in "ash top" is a SELECT query and

> it does have a FTS in it (which is suggesting why the checkpoint waits),

> however, this select query doesn't have any plan change observed in it or

> not any high execution. And also we are not seeing any specific blocking

> session for this. The "wait chain" showing LMON/lock monitor process in one

> of the lines also does point to some event like "Wait for msg sends to

> complete ''. Not yet able to figure out how these are related to this

> issue. Still trying to understand what could have caused such a scenario?

>

> This SELECT query executes 100's of thousands of times in ~15minutes

> window. So is it possible that this query(or say its underlying table CB)

> might have thrashed out from buffer cache to disk and fetched data from

> disk rather than cache for that time and this causing all these sort of

> issues? And considering this table is very small (~1MB in size and holding

> just ~41 rows), should we consider putting it as inmemory by changing

> "inmemory_clause_default" to 'BASE_LEVEL' and "INMEMORY_SIZE" to 16GB ?

>

>


--
http://www.freelists.org/webpage/oracle-l

Received on Tue Nov 29 2022 - 06:17:57 CET

This message: [ Message body ]
Next message: Tanel Poder: "Re: High response time with top 'Other' and 'application' wait class"
Previous message: Lok P: "Re: High response time with top 'Other' and 'application' wait class"
In reply to Lok P: "High response time with top 'Other' and 'application' wait class"
Next in thread: Tanel Poder: "Re: High response time with top 'Other' and 'application' wait class"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message