Re: Interesting problem

From: joel garry <joel-garry_at_home.com>
Date: Wed, 25 May 2011 09:23:04 -0700 (PDT)
Message-ID: <84ca3918-33b1-43b8-8ae3-5d2d3d2d007d_at_r27g2000prr.googlegroups.com>



On May 25, 6:04 am, Mladen Gogala <gogala.mla..._at_gmail.com> wrote:
> Yesterday, on one of my databases, DBMS_SCHEDULER job started burning CPU
> like crazy. I contacted the developer and he told me that the culprit job
> was supposed to regularly purge one fairly large table. The job was
> scheduled ad 03:00 AM and should have finished in half an hour, yet it
> was still running like crazy at 08:45 AM. Not only was it burning CPU,
> the most prevalent wait event was the wait for cache buffer chain latch,
> which is the real surprise here. Here is the source code:
>
> CREATE OR REPLACE PROCEDURE REPORT_CLEANUP_PURGE
>   (
>     p_rowlimit NUMBER DEFAULT 1000)
> AS
> TYPE report_tab
> IS
>   TABLE OF reports.report#%TYPE;
>   rep# report_tab;
>   CURSOR del_cur
>   IS
>      SELECT report#
>        FROM reports
>       WHERE status='RECENT'
>     AND created   < TRUNC(sysdate);
> BEGIN
>   LOOP
>     OPEN del_cur ;
>     LOOP
>       FETCH del_cur BULK COLLECT INTO rep# LIMIT p_rowlimit;
>       EXIT
>     WHEN rep#.count=0;
>       FORALL i    IN rep#.FIRST..rep#.LAST
>        DELETE FROM reports WHERE report# = rep#(i);
>
>       FORALL i IN rep#.FIRST..rep#.LAST
>        DELETE FROM report_agency_Sessions WHERE report# = rep#(i);
>       COMMIT;
>     END LOOP;
>     CLOSE del_cur;
>   END LOOP;
> END;
> /
>
> The problem is really simple. The "exit" statement terminates the
> innermost loop. The outer loop wasn't terminated at all. When the job was
> done, there was nothing more to mop up. The procedure would open the
> cursor, do bulk collect, figure out that the count is zero, exit loop,
> close cursor, re-open it and do the whole thing again. And again. And
> again.
> That was easy to fix. After reloading my Glock and exchanging some kind
> words with the QA people, I am still confused about the buffer cache
> chains. Why was a simple infinite loop causing "latch: buffer chains"
> wait events? It was using much, much more CPU than I expected it to and
> affecting the system in a really adverse way. When I come to think of it,
> I would expect it to use library cache latch, because of the opening and
> closing the cursor literally thousands of times per second, but not buffer
> chains latch. Does anyone have an explanation?
>
> --http://mgogala.byethost5.com

Wild speculation here. Does this have something to do with partitioned tables? There may be some recursive sql that accesses hist_head$ the 10g cursor is:

select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null_cnt, timestamp#, sample_size, minimum, maximum, distcnt, lowval, hival, density, col#, spare1, spare2, avgcln from hist_head$ where obj#=:1 and intcol#=:2

It caught my attention on my system because of the rule hint, and all the research I could find said the select was for partitioned tables, which I'm not using. So I'm curious if you see this hint on 11g, and speculating each time your outer loop executes, it has to propagate the views of various internal tables blocks, hitting buffer chain latches simply because the lack of data makes the tight loop do it so fast.

It came to my notice as I was using dbconsole to look at the hundreds of cursors the erp code uses in a particular session that was I/O intensive, and I sorted by executions. Recursive sql had the most executions, and I find the continued use of the rule hint curious.

jg

--
_at_home.com is bogus.
"Due to numerous and erroneous 911 miscalls, we have changed the
number used to obtain an outside line to 8."
Received on Wed May 25 2011 - 11:23:04 CDT

Original text of this message