Re: "'Resultcache" effect in 10.2.0.4 ??

From: joel garry <joel-garry_at_home.com>
Date: Wed, 29 Jul 2009 09:49:44 -0700 (PDT)
Message-ID: <3a2a7729-4cf5-4c2a-9c79-5de7824bcd8e_at_l5g2000pra.googlegroups.com>



On Jul 29, 8:01 am, jlchiappa <jlchia..._at_gmail.com> wrote:
> > cached in the SGA on the first execution and retrieved from memory on the
> > second execution?
> > Matthias
> >> He did mention in the OP that the number of I/O's (PIO's, I assume...)
> >> and the plans are the same, as well as ALTER SESSION FLUSH
> >> BUFFER_CACHE and FLUSH SHARED POOL  don't make it happen.
>
> YES, this is  exactly the current suspect...
>
> >> Perhaps it is possible that two separate things are happening with the
> >> same apparent effect:  When he shuts the db, it's slow because it has
> >> to load the SGA, when he does it later, a SAN cache or something helps
> >> things along.
>
> This is the only possible explanation until now, so Iīm buying it,
> too.
>
> >> I would think that there would be some way to flood the
> >> OS cache he is talking about to show it is responsible.
>
> yeah, in the test machine thiscolud be done, maybe writing a C program
> mallocīíng more and more RAM until the OS begins to remove pages from
> the OS cache, yes...

I had something simpler in mind, like catting big files around with another machine on the san.

But again, the trace file should point you in the right direction as to what to look at.

>
> >> It's just so
> >> odd that the 9i didn't show the effect, and 10g does (if I'm rereading
> >> it correctly).
>
> Yes you read correctly, this was exactly what happened. Strange,
> agreed, but some possible explanations :
>
>  a) as I said, the files are in cooked filesystems, maybe in Solaris
> 9i was doing direct I/Os (bypassing FS caches) and 10g not, due to
> different code in the db or even due to different settings

Yeah, mbrc, async, all sorts of possibilities. How was it upgraded, in place? exp/imp?

>
>  b) this db attends an Oracle EBS 11i app, and when in 9i it was using
> RULE optimization for sure (I know it was true in 9i epoch), so maybe
> the RBO was generating always a lot of small I/Os and the OS caching
> effect for it was small too, hardly detected, only now with CBO and
> big hashes the effect is more present

Have you compared the 9i and 10g plans?

>
> but all of this is pure speculation, only...
>
> >>That's why I suggested bind peeking or something to do
> >> with histograms.
>
> Nope, as I said even if we test with the exact SAME values 2x the
> effect is the same, so no way for bvp, I think...
>
> >> Hopefully the 10046 will show more exactly where the
> >> slow is, over all the combinations of situations.
>
> In truth, my expectations about the 10046 are : if I could get 2 10046
> trace files about 2 executions of the same SQL with the same values
> but flushing buffer pool and hard parsing the SQLs each time , imho
> the plans and the I/Os will be more or less the same, with this if I
> see in the 10046 the same I/Os being faster in the 2nd exec, I prooved
> that some "external force" is acting, something outside the db, being
> OS cache, hardware cache, something like that...

Sounds like it will be informative.

>
> I've been trying to
>
> >> figure out if MBRC could somehow have this effect (different between
> >> versions), but I don't think so - but see
> >>http://www.freelists.org/post/oracle-l/db-file-multiblock-read-count-...
> >> (and read the whole thread).  Then of course, it could be some solaris
> >> specific thing, I haven't been on solaris in a long time.
>
> Hmmm, different MBRCs ??? I havenīt thought about this, maybe this
> could be a possibility...
>
> >> Is there any possibility we might find out the storage hardware
> >> architecture?
>
> Sure : the storage is an IBM Shark (unsure about the exact model and
> cache sizings), disks in RAID-5 groups, and the space is formatted in
> native (cooked) Solaris filesystems, thatīs it...
>
> >>IMHO, you are just guessing until you can actually get a trace.  Also,
> >> I wouldn't change anything.  Run it once with a trace, log out, log
> >> in, and run it again with a trace.
>
> Sure.
>
> >> What if you do the following:
> >> - restart the instance
> >> - execute and time the query
> >> - reboot the Solaris server and start the instance
> >> - execute and time the query
> >> If the second query is faster than the first one, the data is probably
> >> cached by the SAN. If not, the OS cache is probably playing a role.
>
> very good idea, an interesting test, will try it if possible...
>
> >> Out of curiosity, it sounds like your colleague isn't very interested
> >> in getting this fixed if he/she can't provide you with a trace?
>
> How true : itīs the same old story, the behavior occurs only in the
> big beast SQLs, that ones naturally slow and complex, so the end-users
> are "used" to get bad response times, and anyway this kind of monsters
> are "batch" things, they are not directly impacting day to day work,
> so no much interest here, yes... In truth, my intention, more than
> only helping my colleague, is to learn, receive different inputs, and
> to be prepared if something like this occurs with me, only... In some
> measure I already received this, received some hints and thoughts, and
> the important, eliminated the chance for some obscure point (maybe a
> 'hidden' result cache) inside the db being present without my
> knowledge, this was a main objective here, too, and this possibility
> was eliminated , I wanted it...

Yeah, this is where I diverge with method-r as I understand it - the business doesn't always know how to rate performance issues. But I think method-r would do you well here, aside from that.

jg

--
_at_home.com is bogus.
The start of the real computer overlords:
http://www3.signonsandiego.com/stories/2009/jul/29/verizon-qualcomm-join-forces-venture/?uniontrib
Received on Wed Jul 29 2009 - 11:49:44 CDT

Original text of this message