Re: Query runs fine on InstanceA but slow on InstanceB...

From: Cary Millsap <cary.millsap_at_method-r.com>
Date: Fri, 6 Jan 2012 17:13:56 -0600
Message-ID: <CAJOkrQYXzjKj6-geZJpS22QUQmWuLqM6VcEpGbAO-8dzn2irJA_at_mail.gmail.com>



+1 on Kellyn's advice to use 10046 data. If it were my problem, here's specifically what I would do:
  1. Isolate a business task—call it T—to use as a control. Sounds like you've already done that.
  2. Trace (dbms_monitor.session_trace_enable(null,null,true,true)) a run of T on instance A.
  3. Trace a run of T on instance B.
  4. Run mrskew (a utility in the Method R Tools<http://method-r.com/software/mrtools>product) on each trace file. The output will resemble this:

$ mrskew file.trc

CALL-NAME                       DURATION       %   CALLS      MEAN
MIN       MAX
-----------------------------  ---------  ------  ------  --------
  • -------- SQL*Net message from client 20.742551 64.8% 2,773 0.007480 0.000316 0.451740 db file sequential read 8.052732 25.2% 508 0.015852 0.000238 0.254242 EXEC 1.717730 5.4% 1,612 0.001066 0.000000 0.243963 log file sync 0.453328 1.4% 156 0.002906 0.000429 0.101154 PARSE 0.281959 0.9% 1,446 0.000195 0.000000 0.044992 cursor: pin S wait on X 0.261647 0.8% 24 0.010902 0.010309 0.011166 SQL*Net more data from client 0.168530 0.5% 67 0.002515 0.000801 0.106081 FETCH 0.118982 0.4% 2,236 0.000053 0.000000 0.008999 row cache lock 0.085146 0.3% 1,030 0.000083 0.000028 0.000383 latch: shared pool 0.049684 0.2% 11 0.004517 0.000745 0.007201 6 others 0.055326 0.2% 3,015 0.000018 0.000000 0.018224 ----------------------------- --------- ------ ------ --------
  • -------- TOTAL (16) 31.987615 100.0% 12,878 0.002484 0.000000 0.451740 5. Compare the two profiles side-by-side. The comparison will tell you which type of database call or OS call is consuming more of your time. 6. If there's no one or two call types that really show the difference, then group by something other than call name. For example, here's what grouping by SQL hash value will look like:

$ mrskew *.trc --group='$hv'
'$hv' DURATION % CALLS MEAN MIN MAX ----------- --------- ------ ------ -------- -------- --------

#0            5.109908   16.0%     688  0.007427  0.000000  0.443144
364923679     3.090265    9.7%     447  0.006913  0.000000  0.412162
3150898423    1.392148    4.4%     140  0.009944  0.000238  0.097124
3129187332    0.990504    3.1%     140  0.007075  0.000000  0.238923
864012087     0.988646    3.1%      49  0.020176  0.000312  0.126767
2888850659    0.949462    3.0%      73  0.013006  0.000044  0.133055
2680677334    0.912911    2.9%     106  0.008612  0.000000  0.249404
3902328072    0.883581    2.8%     423  0.002089  0.000000  0.451740
4013634959    0.833338    2.6%     102  0.008170  0.000000  0.354018
1471759920    0.757588    2.4%     164  0.004619  0.000000  0.218535
117 others   16.079264   50.3%  10,546  0.001525  0.000000  0.310893
----------- --------- ------ ------ -------- -------- -------- TOTAL (127) 31.987615 100.0% 12,878 0.002484 0.000000 0.451740 7. Repeat the process of selecting a dimension to group by (e.g., call name, hash value) and comparing side-by-side until you find the categories where your time consumption is different. Once you learn that, then think about how the code path works that's beneath what you've found. For example, if all the difference is attributable to just one SQL statement, then have your plan hash values (plh) changed? Are your read call latencies on one instance much worse than the other?

Rather than trying to check everything under the sun that *could* be your difference, this method will show you in just a step or two what it *is* that's causing the difference.

Cary Millsap
Method R Corporation
http://method-r.com

On Fri, Jan 6, 2012 at 11:20 AM, Kellyn Pot'vin <kellyn.potvin_at_ymail.com>wrote:

> Hi Dave,
> I think you may have what you need already at your fingertips with trace
> files if you are sure this process is the clearly the culprit and not the
> whole environment. As the query creates a more involved trace when you
> produce a 10053 trace file, why not start with a 10046 trace file, level
> 12, (bind variables and waits, so if there is any bind peeking issues,
> etc., they will show) which can then be run through TKPROF to give a clear
> output of what wait differences there are? Method R's tools, such as Mr
> Trace also an excellent tool for deciphering waits that may not be caught
> by TKPROF and some of the tools are quite cost effective, such as the
> plug-in for Oracle Developer.

>

> This might give you the higher level information you need and if it does
> not answer your question, you can then run the 10053 trace and drill down
> to the specific waits listed in the 10046 to help limit what data you
> inspect in the larger trace file.
>

> Hope this helps,
>
>

> Kellyn Pot'Vin
> Sr. Database Administrator and Developer
> DBAKevlar.com
>
>

> ________________________________
> From: "Taylor, Chris David" <ChrisDavid.Taylor_at_ingrambarge.com>
> To: "'dmann99_at_gmail.com'" <dmann99_at_gmail.com>
> Cc: "'oracle-l_at_freelists.org'" <oracle-l_at_freelists.org>
> Sent: Friday, January 6, 2012 8:41 AM
> Subject: RE: Query runs fine on InstanceA but slow on InstanceB...
>

> David,
>

> I was thinking several different options as I read your email - however,
> the more I read, each option quickly disappeared as you covered the
> strategies I would have used (minus SQLT which I haven't used).
>

> So, I'd be interested in an update on this if you find a solution that
> works well.
>

> One thing you didn't cover, is are these instances on the same server? Is
> it a vm or physical? Is the query running the same way in both instances -
> same app, same workstation (or whatever)? Have you exported stats from one
> instance to the other to see if that resolves it?
>

> Often it doesn't take a *significant* difference in stats (or parameters)
> between instances to really change performance.
>

> I'd probably start by exporting stats for the affected tables (schema
> stats if possible) from Instance A to Instance B. If that resolves it,
> then you know it's stats related. (I *think*)
>

> Chris Taylor
> Sr. Oracle DBA
> Ingram Barge Company
> Nashville, TN 37205
>

> "Quality is never an accident; it is always the result of intelligent
> effort."
> -- John Ruskin (English Writer 1819-1900)
>

> CONFIDENTIALITY NOTICE: This e-mail and any attachments are confidential
> and may also be privileged. If you are not the named recipient, please
> notify the sender immediately and delete the contents of this message
> without disclosing the contents to anyone, using them for any purpose, or
> storing or copying the information on any medium.
>
>

> -----Original Message-----
> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org]
> On Behalf Of David Mann
> Sent: Friday, January 06, 2012 8:42 AM
> To: oracle-l_at_freelists.org
> Subject: Query runs fine on InstanceA but slow on InstanceB...
>

> I need some advice. I seem to be asked more and more the dreaded "why does
> my query run faster on InstanceA and slower on InstanceB?"
>

> I provide all kinds of query tuning help to our developers, testers, and
> product support. Mostly this covers a surgical strike of 1 SQL statement. I
> investigate the issue, find a couple of ways to improve the query, document
> my improvement and provide my recommendation ...
> and then move on.
>

> This A vs B seems to take me about 4X the time of a regular tuning
> request. I am now investigating the performance of a query on InstanceA, a
> query on InstanceB, and using my ever evolving checklist of why execution
> plans can differ to drill down into each reason (instance params, data
> volume, system stats, object stats, etc) and look for significant
> differences. I can usually find a way to improve the Query_at_InstanceBperformance but I am not always successful finding a root cause for the
> difference. Plus I usually want to know this as much as my customer so I
> know what to look out for when dealing with future tuning issues with their
> apps.
>

> I have my own checklist to guide my investigation and a bunch of custom
> queries to check data volume, system statistics, object statistics (basic
> table statistics right now, would like to expand to cover histograms, etc).
> I don't think I am covering all the bases since my checklist still gets
> changed/expanded after every tuning engagement.
>

> SQLT probably covers everything I look for (and more) has been a good tool
> to get an idea of the environment surrounding the execution of a query on a
> system. but it is a little cumbersome at times (I can be called on to help
> with any of 600 DB instances - I end up installing it pretty much every
> time I need it somewhere to make sure I have the latest/greatest version).
> I had high hopes but limited success with SQLTCompare. I have only gotten
> it to compare successfully maybe 2 out of the 6 times I used it.
>
>

> I have also had some success with 10053 traces for smaller queries.
> But at least for me these quickly get out of hand so I usually only use
> them as a last resort.
>

> So my question is how do you approach something like this? Is there a
> 'best' method or tool for investigating and drilling down to a root cause
> of the same statement against 2 difference DB instances? Are there any book
> chapters, blog posts, or other resources you have seen covering this type
> of investigation? Is it time for me to get busy writing my own tool? :)
>

> --
> Dave Mann
> www.brainio.us
> www.ba6.us - Database Stuff - http://www.ba6.us/rss.xml
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>
>

> --
> http://www.freelists.org/webpage/oracle-l
>

> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Jan 06 2012 - 17:13:56 CST

Original text of this message