RE: Manual execution vs v$session discrepancy

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Thu, 10 Sep 2015 06:24:32 +0000
Message-ID: <CE70217733273F49A8A162EE074F64D9282A3366_at_EXMBX01.thus.corp>


The 0.04 seconds for 1,671 executions looks like a figure you can't trust because of possible granularity error (in this case a larger number of very small times being seen as zero).

Do the two child cursors have the same plan_hash_value. How many disk_reads and buffer_gets (total, and per execution) What about the other times reported in v$sql - any clues there.

There's also v$sql_optimizer_env which you can address by sql_id and child_number - this will list the optimizer environments for the two child cursors: check if they differ, check if your optimizer environment for testing differs from the production one. Bear in mind, also, that there's a "secure_view_merging" effect that means different users may get plans that differ in view merging.

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
_at_jloracle



From: Ram Raman [veeeraman_at_gmail.com] Sent: 10 September 2015 01:37
To: Jonathan Lewis
Cc: ORACLE-L
Subject: Re: Manual execution vs v$session discrepancy

ok not a great stat, but this seems to indicate that it is waiting on something since the time taken is more about 5x.

SQL_ID EXECUTIONS FIRST_LOAD_TIME CPUSECS ELAPSEDSECS
------------- ---------- ------------------- ---------- -----------

4abcc3v6tdfqj      14081 2015-09-09/10:49:54   12381.73  58403.7323
4abcc3v6tdfqj      1673 2015-09-09/10:49:54        .04    1.450262


On Wed, Sep 9, 2015 at 7:30 PM, Ram Raman <veeeraman_at_gmail.com<mailto:veeeraman_at_gmail.com>> wrote: Jonathan,

We have run into a very similar situation with another SQL in our prod environment again. The awr reports also show average of 4 seconds/exec for this query in consideration.

I decided to trace the query anyway, but I am not lucky there. I know the sessionid, but when I set the trace I am not able to locate the trace file. I cannot modify the query in the app server since that is a different group; I set the trace on at level 8 with dbms_system.set_ev. v$session shows trace enabled for the session in qn, but I do not know which one is the trace file. I tried trcsess by giving it sid.serial#, that seems to produce a file with just 4 lines and no useful info.

I can sample the sql bind capture but as before, the values from there return superfast when i run the query for those values. I used the values used while peeking (thanks to Jonathans blog), still the results were fast with those values. fyi, I test these values in prod itself. child_cursors - maybe one or two. I should have checked for variation in the child cursors, but was too busy. this is a psft fin app. as before several 1000 executions per day.

At this point, I would really like to see the bind var values for which the queries are taking several seconds. Any help is greatly appreciated.

Thanks.

On Thu, Sep 3, 2015 at 1:51 AM, Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk<mailto:jonathan_at_jlcomp.demon.co.uk>> wrote:

It's perfectly feasible for a session to report a much longer run-time for a query than v$sql shows. The session time you're looking at may include all the network roundtrip time and any client processing time that goes on between round-trips.

As far as dealing with 7 seconds vs. 2/10th seconds - some of that may be down to the way the client and server communicate, some of it may be down to the server being busy while the production query was being run, some may be down very specifically to the concurrency of usage (competing for latches and mutexes, driving up CPU time), some may be down to the production data being subject to update and read-consistency overheads while the queries were being run. However, the difference does look a little extreme so ...

Apart from acquiring sets of bind values, did you also acquire the bind values used when the cursors were optimised ? Did you check the plan(s) actually used matched the plan(s) your tests produced.

When you tested did you use the bind variable sets as literals to the query, or did you have a framework that injected them as bind variables of the correct type? When you tested was it against the same data set with the same set of object statistics and similar concurrent rate of change of data ? When you queried v$sql how many child cursors were in existence (and what was the highest child number) ? If there were multiple child cursors was there much variation in the stats between children. Were the figures in v$sqlstat consistent with those in v$sql. vary Did the details from v$sql show you similar buffer_gets in the production and test cases. There are several time-related columns in v$sql beyond elapsed_time and cpu_time (e.g. plsql_exec_time, java_exec_time, user_io_wait_time, cluster_wait_time) are there any clues where the time was spent?

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
_at_jloracle



From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org> [oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>] on behalf of Ram Raman [veeeraman_at_gmail.com<mailto:veeeraman_at_gmail.com>] Sent: 03 September 2015 03:50
To: ORACLE-L
Subject: Manual execution vs v$session discrepancy

List,

We had a SQL that was being run repeatedly over 10K times in our prod system today. Turned out to be something related to fiscal year. It spiked up our CPU big time. I got several value sets for the SQL from the sql bind capture and executed them myself for about a dozen value sets; each and everyone of my manual execution returned in about 2/10ths of a second. However the v$session view would show the sessions run that SQL (sql_id, last_call_et, status) for a few minutes. That was the case for several users the whole day. Per v$sql, the average execution time for that SQL was 7 seconds since the first load this morning. AWR reports showed that the averages are anywhere from 4 seconds and up depending on the hour. I was confused by what the v$session showed me vs my execution times. Am I missing something here? bind capture interval is left at default 900 seconds.

Could it be that the values I checked (about 15) just happened to be 'good values' or the environment could have been different for the executed SQLs? I was hesitant to turn on the trace with the CPU usage through the roof. v10.2

Thanks.

--

--

--

--

http://www.freelists.org/webpage/oracle-l Received on Thu Sep 10 2015 - 08:24:32 CEST

Original text of this message