Re: Killer SQL and PGA

From: Robert Laverty <Robert.Laverty_at_MolinaHealthCare.Com>
Date: Thu, 2 Feb 2012 12:18:55 -0800
Message-ID: <2500EFA954A6424E9B188BFDCD37EBF101035CE6304A_at_CORPMAIL.molina.mhc>


Greg - I am pulling the logs and other details to submit as an SR to Oracle.
Pete - I think the issue may be different, but the recommended workaround (turning off automatic PGA management) may end up being the same.
Mark - I agree that the external memory limits may be worse.  I did mention that the full query (400+ ICNs) finished in 5 minutes on the 9i standby machine that's normally used for ad hoc reporting purposes and that the execution plans look the same on both platforms, at least for an 'IN' list that only contained 5 ICNs.  They both look up all entries of the list against each table, one at a time, using unique index each time, and concatenate the results.  The two machines have the same PGA settings.

I think the user understands the importance of running these queries on the standby box, but we cannot afford to have this box crashing either. It's being upgraded in the next week or so. Thus the need for a solution that works for 11g on both systems.


Date: Tue, 31 Jan 2012 09:37:07 +0000

Subject: Re: Killer SQL and PGA

From: Peter Hitchman <pjhoraclel_at_gmail.com>

Hi,

There is a not on MOS - 460424.1, about Solaris 10 issues with PGA

memory allocation. Not sure it fits your problem or not.

Have you refreshed statistics on the 11.2 database? If not I would try

that assuming that the issue is not down to contention between Solaris

10 and Oracle memory management.

Regards

Pete


From: "Mark W. Farnham" <mwf_at_rsiz.com>

Subject: RE: Killer SQL and PGA

Date: Tue, 31 Jan 2012 09:17:18 -0500

ulimit?

The tradeoff here is whether to utilize virtual memory to allow gargantuan

jobs to theoretically be processed at all. Seymour Cray nailed that in one

when he was asked why his systems did not support virtual memory.

It will always be possible to concoct queries and problems that cannot be

solved within the limits of any arbitrary amount of real memory.

And of course if you put in ulimit limitations and exceed them, then within

some timeslice or interrupt that allows the kernel to figure that out the

violating process gets killed. The side effects of the operating system

killing Oracle processes is too complex to analyze in the general case.

You mentioned that you had recently moved from 9i to 11gR2, but it is not

clear to me whether your "analyst" had previously run the same query in 9i

successfully. If so, *probably* it used an entirely different plan. Then

again, I've often seen folks test limits on an upgraded system, for example

in this context, by supplying a longer "in" list until something breaks.

Those are the folks you hope to identify to play games on a "conference room

pilot" or quality assurance test upgrade before you go live.

Likewise, proper education to help them avoid unleashing experimental loads

on the production system is usually part of the solution.

Regards,

mwf


Date: Tue, 31 Jan 2012 11:18:45 -0800

Subject: Re: Killer SQL and PGA

From: Greg Rahn <greg_at_structureddata.org>

If you have a reproducible test case, open an SR and file a bug. Even

though PGA memory management is controlled via a "target" not a "limit",

overshooting it by a such a significant is clearly not the expected

behavior.

Be sure to include a test case builder archive for the problem statement

and a SQL Monitor report. See:

http://blogs.oracle.com/optimizer/entry/oracle_keeps_closing_my_tar_because_i_cannot_provide_a_testcase_can_you_help

https://raw.github.com/grahn/oracle_scripts/master/tcb.sh

http://structureddata.org/2008/01/06/oracle-11g-real-time-sql-monitoring-using-dbms_sqltunereport_sql_monitor/


-----Original Message-----

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org]

On Behalf Of Robert Laverty

Sent: Monday, January 30, 2012 6:13 PM

To: oracle-l_at_freelists.org

Subject: Killer SQL and PGA

We had a problem with our PGA growing larger than physical memory, dragging

the system down until we were forced to restart the database. We recently

upgraded from 9i to 11gR2 on Solaris with 16Gb physical memory hosting an

OLTP application. 4Gb is used for SGA and 400Mb for PGA_aggregate_target.

AMM and ASMM have not been enabled. Workarea_size_policy is set to AUTO.

This is a simple database. No RAC, no shared servers, no parallel

processing.

One of our analysts launched what appeared to be a simple ad hoc query:

select * from deniedhist where icn in ('1', '2', '3', ... , 'x').

Unfortunately, the analyst had over 400 ICNs listed. DENIEDHIST is a UNION

ALL view representing a faux-partitioned array of 129 tables, each with its

own unique index on ICN. The execution plan (my comments included) shows

that it is searching each underlying table 'x' number of times, using the

unique index each time. This is the plan from a test version of the SQL

with only 5 ICNs. The rest of the plan shows the iteration through the

other 128 underlying tables (other views of the plan show the unique table

and index names).

1422 SELECT STATEMENT REMOTE CHOOSE      1421 VIEW POS.DENIEDHIST           1420 UNION-ALL                11 CONCATENATION -- This is table 1 of 129 in the view

                  2 TABLE ACCESS BY INDEX ROWID POS.DENIED_HISTORY

                   1 INDEX UNIQUE SCAN UNIQUE POS.DENIED_HISTORY_ICN

                  4 TABLE ACCESS BY INDEX ROWID POS.DENIED_HISTORY

                   3 INDEX UNIQUE SCAN UNIQUE POS.DENIED_HISTORY_ICN

                  6 TABLE ACCESS BY INDEX ROWID POS.DENIED_HISTORY

                   5 INDEX UNIQUE SCAN UNIQUE POS.DENIED_HISTORY_ICN

                  8 TABLE ACCESS BY INDEX ROWID POS.DENIED_HISTORY

                   7 INDEX UNIQUE SCAN UNIQUE POS.DENIED_HISTORY_ICN

                 10 TABLE ACCESS BY INDEX ROWID POS.DENIED_HISTORY

                   9 INDEX UNIQUE SCAN UNIQUE POS.DENIED_HISTORY_ICN

               22 CONCATENATION  -- This is table 2 of 129 in the view

            ...



I estimated that the original version probably had over 120,000 steps in its

execution plan. It ran for nearly an hour until memory reached a critical

point, almost 20 Gb and paging like mad, that required us to abort the

database and restart. I can run the full original query on a standby

machine which is physically identical but is still running 9i. It finishes

in 5 minutes and the PGA only grows to 500 Mb. The execution plans, at

least for the small test version of the query, are the same on both

machines.

I understand the problems with large 'IN' lists. There was a conversation

on that topic here at the end of October . Educating the analyst and

redesigning the query are secondary concerns. I also realize, after reading

a bunch of Tom Kyte posts, that pga_aggregate_target will not constrain the

growth of the PGA.

My real question is why the 11g memory management, without AMM or ASMM,

would allow the PGA to grow so large. In 15 years of operations, there must

have been similar bad queries against the database. This happened a day

after the 11g upgrade. Any suggestions?

Bob Laverty

Molina Medicaid Solutions

IMPORTANT NOTICE TO RECIPIENT: This email is meant only for the intended recipient of the transmission. In addition, this email may be a communication that is privileged by law. If you received this email in error, any review, use, disclosure, distribution, or copying of this email is strictly prohibited. Please notify us immediately of the error by return email, and please delete this email from your system. Thank you for your cooperation.

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Feb 02 2012 - 14:18:55 CST

Original text of this message