RE: Select statement runs slow until flush of shared pool

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Tue, 27 Nov 2012 16:31:43 -0500
Message-ID: <020501cdcce6$99643cc0$cc2cb640$_at_rsiz.com>



Do you perchance have whole database server side result caching turned on?

Last I checked it was still failing to purge obsoleted results. That does not produce a new hash, but it can produce and increasingly lengthy list of result sets to paw through seeking a valid one and hilarity may result.

As neat as sliced bread for really quiescent databases, this feature is danger with a hair trigger if you're frequently running something that is repeatedly invalidated as regards previous results. Now this matches you symptoms as far as I can see, but it could be wildly off target. Just thought I'd mention it.

mwf

-----Original Message-----

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Jorgensen, Finn
Sent: Tuesday, November 27, 2012 2:48 PM To: Tanel Poder
Cc: oracle Freelists (Oracle-L_at_freelists.org) Subject: RE: Select statement runs slow until flush of shared pool

As far as I can tell there is only one plan hash value: SQL> select plan_hash_value, count(*) from DBA_HIST_SQL_PLAN where SQL> sql_id = 'dy6p54n6ymuhx' group by plan_hash_value;

PLAN_HASH_VALUE COUNT(*)
--------------- ----------

     1543055307 25

I've had a couple of people suggest maybe the statement was running with a bad plan and thus it was slow. However, the slow running statements are fixed more or less immediately by a flush of the shared pool. What I'm trying to explain is that "currently running" SQL is fixed by the flush. The execution plan doesn't change once the statement is running. That's why I went down the path of latches etc.

I've also sent my SQLT XTRACT output to Carlos who kindly offered to review it. I will keep the list updated on that.

Any other ideas are welcome.

Thanks,
Finn

From: tanel_at_poderc.com [mailto:tanel_at_poderc.com] On Behalf Of Tanel Poder Sent: Tuesday, November 27, 2012 1:37 PM To: Jorgensen, Finn
Cc: oracle Freelists (Oracle-L_at_freelists.org) Subject: Re: Select statement runs slow until flush of shared pool

Hi Finn,

I'd start from higher level details:

Look through the ASH history for this SQL ID to see whether this SQL has different plan hash values for it - and whether a different plan hash value shows up during the problem time ... this could "just" be an execution plan flip, where an execution plan goes bad every now and then (many different reasons, including invalidation+bind peeking or adaptive cursor sharing creating a new bad plan etc).

If your SQL gets very slow, burns lots of CPU and seems to be holding cache buffers chains latches, it's doing logical IOs ... this is often a symptom of join order or some join method changing so that a NESTED LOOP join or FILTER loop lookup revisits the inner row source of the join way too many times. Looking into the execution plan (if a new plan hash value shows up during the problem time) would help.

And then it's matter of finding out why this new cursor with a new plan was created... the V$SQL_SHARED_CURSOR (_at_nonshared.sql) would be where I'd look into first (when the issue is ongoing).

--

Tanel Poder
Enkitec
Blog - http://blog.tanelpoder.com
App - http://voic.ee

--

Tanel Poder
Enkitec Europe
+372 56 956 181
http://www.enkitec.com/
Expert Oracle Exadata book:
http://www.apress.com/9781430233923

On Tue, Nov 27, 2012 at 7:34 PM, Jorgensen, Finn <Finn.Jorgensen_at_constellation.com<mailto:Finn.Jorgensen_at_constellation.com>> wrote:
List:
10.2.0.3 EE on Solaris. No RAC.

I have a client who run a report which consist of just a select statement. Nothing really special except there's a couple of stored function calls in the select.
From time to time while running this report it gets hung up and takes hours to complete. When that happens somebody on the DBA team realized that flushing the shared pool fixes the problem and the report end almost immediately after the flushing.

I was oncall this morning when they needed the shared pool flush. I took a little time to see if I could quickly determine what was locking up the SQL statement.
In OEM there was a sea of green meaning it was all registered as CPU time. I could see 9 sessions running the select statement at the same time. I checked the dynamic SGA and got this:

COMPONENT                                                        CURRENT_MB
MIN_MB     MAX_MB

---------------------------------------------------------------- ----------
---------- ----------
DEFAULT buffer cache 848 768 0 java pool 48 48 0 large pool 48 16 0 shared pool 1024 848 0 streams pool 16 0 0

I checked how much free memory there was in the shared pool and got this:   1 select * from v$sgastat
  2 where pool = 'shared pool'
  3* and name = 'free memory'
SQL> /

POOL         NAME                            BYTES

------------ -------------------------- ----------
shared pool free memory 728943624 ---------- sum 728943624

I used Tanel Poders sgastatx script to check for subpools and got this (I only have 1 subpool):
SQL> _at_sgastatx total

  • All allocations:
SUBPOOL                             BYTES         MB

------------------------------ ---------- ----------
shared pool (0 - Unused): 33554432 32 shared pool (1): 1177308608 1122.77 shared pool (Total): 1210863040 1154.77 ---------- sum 2421726080

I then ran latchprofx and got this:
SQL > _at_latchprofx name,sid,sqlid,sqlchild % % 100000

NAME                                       SID SQLID           SQLCHILD
Held       Gets  Held %     Held ms Avg hold ms

----------------------------------- ---------- ------------- ----------
---------- ---------- ------- ----------- -----------
cache buffers chains 220 dy6p54n6ymuhx 3 1938 1936 1.94 159.497 .082 cache buffers chains 102 dy6p54n6ymuhx 3 1650 1648 1.65 135.795 .082 cache buffers chains 104 dy6p54n6ymuhx 3 1605 1603 1.61 132.092 .082 cache buffers chains 356 dy6p54n6ymuhx 3 1401 1392 1.40 115.302 .083 simulator lru latch 220 dy6p54n6ymuhx 3 401 397 .40 33.002 .083 simulator lru latch 102 dy6p54n6ymuhx 3 375 375 .38 30.863 .082 simulator lru latch 104 dy6p54n6ymuhx 3 353 353 .35 29.052 .082 simulator lru latch 356 dy6p54n6ymuhx 3 336 332 .34 27.653 .083 cache buffers lru chain 220 dy6p54n6ymuhx 3 80 80 .08 6.584 .082 cache buffers chains 251 184w65v7z6cpr 1 61 61 .06 5.020 .082 cache buffers lru chain 102 dy6p54n6ymuhx 3 20 20 .02 1.646 .082 archive process latch 386 0 20 3 .02 1.646 .549 cache buffers lru chain 356 dy6p54n6ymuhx 3 17 16 .02 1.399 .087 multiblock read objects 220 dy6p54n6ymuhx 3 15 15 .02 1.235 .082 SQL memory manager latch 386 0 12 3 .01 .988 .329 cache buffers lru chain 104 dy6p54n6ymuhx 3 9 8 .01 .741 .093 object queue header operation 220 dy6p54n6ymuhx 3 6 6 .01 .494 .082 cache buffers chains 247 8dz24ywwkrgn3 0 6 6 .01 .494 .082 redo copy 212 6szcgz387bc84 3 6 6 .01 .494 .082 cache buffers chains 367 6jn4skq8jmbwj 3 6 6 .01 .494 .082

The SQL id dy6p54n6ymuhx is the SQL statement that hangs until the shared pool is flushed.

At this point I'm not sure what to look for. There are no locks and no wait event reported.

Any ideas are welcome.

Thanks,
Finn

>>> This e-mail and any attachments are confidential, may contain legal, professional or other privileged information, and are intended solely for the addressee. If you are not the intended recipient, do not use the information in this e-mail in any way, delete this e-mail and notify the sender. -IP2

--

http://www.freelists.org/webpage/oracle-l

--

http://www.freelists.org/webpage/oracle-l

--

http://www.freelists.org/webpage/oracle-l Received on Tue Nov 27 2012 - 22:31:43 CET

Original text of this message