RE: Performance Problem.

From: Zanen van, J (Jacob) <Jacob.van.Zanen_at_rabobank.com>
Date: Tue, 6 Nov 2007 14:28:28 +1100
Message-ID: <49AD3377867B8A46A6B58C1551D4B55C024C6CFE@rabobank.com>

Hi Alex,

I have installed statspack on this machine and next time this problem occurs I hope to have more information spaced in 15 minute intervals :-) For now unfortunately I have really no more information on the matter untill it happens again.

To be continued......

Jack

-----Original Message-----
From: oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Alex Gorbachev Sent: Tuesday, 6 November 2007 2:09 PM
To: Crisler, Jon
Cc: oracle-l_at_freelists.org
Subject: Re: Performance Problem.

Jon,

Unfortunately, I'm by no mean a memory expert but here are some details.

The CPU overhead is more noticeable when there are many sessions connecting often, do lots of LIO on different blocks and disconnect. When a process accesses a shared memory page *first* time, a new entry about this page in process' page table is created for this process but only on the *first* access. It lives as long as Oracle process is alive. On every access to virtual memory (all memory is virtual actually), CPU has to lookup through page table. To speed it up, those entries are cached in CPU in translation lookaside buffers (TLB). TLB miss is expensive indeed.

If you have lots of persistent connections idle most of the time but accessing buffer cache a lot when called then memory overhead due to large page tables can be huge and it can easilly grow higher than SGA itself.

I don't think there is any visibility from Oracle on this issue besides some increase in CPU that's *usually* not very significant unless your shared memory page descriptors occupy too much memory and machine starts swapping. /proc/meminfo (PageTables) figure will show memory overhead and can give an idea of possible CPU overhead but I don't know detailed Linux stats to use it.

Oh... I almost forgot to mention that huge pages are locked in memory and SGA is not affected by swapping. This could be a saver in certain circumstances but it's rather a workaround for poor memory configuration.

Furthermore, the original post was about Solaris 64 bit and it has support for large pages (someone with more intimate knowledge of Solaris please feel free to provide details) out of the box. Solaris can support up to 256 MB page size (Giant pages).
Here is an example of "pmap -xs <oracle_process_pid> | grep shm": 0000000380000000 1830912 1830912 - 1830912 4M rwxsR [ ism shmid=0x5dd ]

There is nothing to do in Solaris to enable usage is larger page size with Oracle. As far as I know it's automatic.

Now lets go to the cool sides of real Unixes. :) In Solaris 9, you can see what is TLB misses overhead using "trapstat -T". It will show % of time lost due to TLB miss. Works on SPARC but I'm not sure it will work on x86 as well.

As I already mentioned, I'm not particularly knowledgeable about memory management internals so if someone noticed something wrong - please pitch in.

Regarding "reading too much"... It's not reading too much but *guessing* too much. ;-) Of course, every theory has the right to exist but more often than not there are more productive ways. Please don't take it personal. This list is often especially great in dead-end situations when experience and knowledge of participants helps a lot as well as in many other situations. We can only speculate now because OP doesn't provide any more details. Maybe because there was no magical solution to the problem?

Cheers,
Alex

On Nov 5, 2007 10:53 AM, Crisler, Jon <Jon.Crisler_at_usi.com> wrote:
> I think you are reading too much into my note- I was just trying to
> point out that on Linux with very large SGA's, this is a frequent
> problem we have seen on a number of RAC systems. I have not been able

> to figure out a way inside of Oracle to single out this metric (not
> that I tried that hard either), but I believe on a small-pages system
> it also reports larger than normal session cpu compared to a similar
> test on systems with hugepages. Most likely there is a wait statistic

> but I don't know which one, and if it can be reliably traced back to
> hugepages.
>
> Another gotcha on Peoplesoft systems is that the cursor_sharing option

> is frequently turned off on 10.2.0.3 systems, due to a bug that throws
> ORA-01008 errors. This is easily addressed with patch 5863277 - I
> believe this bug was introduced with 10.2.0.3 and only affects some 64

> bit platforms like Linux 64 and Solaris.
>
> -----Original Message-----
> From: gorbyx_at_gmail.com [mailto:gorbyx_at_gmail.com] On Behalf Of Alex
> Gorbachev
>
> Sent: Saturday, November 03, 2007 6:22 PM
> To: Crisler, Jon
> Cc: Jacob.van.Zanen_at_rabobank.com; oracle-l_at_freelists.org;
> pythianbrinsmead_at_gmail.com
> Subject: Re: Performance Problem.
>
> Jon,
>
> *Unless I'm missing something*, chances that Jacob has memory
> management issues are small. Looking at semaphores statistics is a
> very indirect way of troubleshooting to say the least. Buffer cache
> hit ratio might be just as useful.
>
> A good starting point would be a quick 10046 trace on the most
> important sessions (a script to enable it can be prepared in advance)
> and/or at least several snaps of v$session and maybe
> v$sesstat/v$sysstat/v$system_event/v$session_event (mini statspack on
> small intervals if you wish).
>
> Btw, Mark's suggestion on bind variables peaking falls into the same
> category of "sounds like". ;-) So even though it could be the case -
> it's just one of many possibilities and poking around while there are
> better troubleshooting methods available is probably a sub-optimal and

> counter-productive as I would call it.
>
> There are set of "rules" that are often followed for Peoplesoft in 9i
> and 10g like setting OPTIMIZER_FEATURES_ENABLE=8.1.7. We had recently
> migrated on customer from HR 8.8 on 9i to HR 8.9 on 10g RAC. We were
> working on optimization of set of queries that went mad and were
> mostly done but customer decided to finally follow the advice from
> Peoplesoft to set some of init.ora parameters including
> OPTIMIZER_FEATURES_ENABLE.
>
> I don't have much Peoplesoft experience and I know that administrators

> of such applications are typically conservative. However, sticking to
> OPTIMIZER_FEATURES_ENABLE=8.1.7 without attempt to make use of all new

> CBO features of latest Oracle releases is a short sighted approach. Is

> anyone running Peoplesoft on 10g without
> OPTIMIZER_FEATURES_ENABLE=8.1.7?
>
> By the way, if OPTIMIZER_FEATURES_ENABLE=8.1.7 is set in Jacob's
> environment, then bind variable peaking issue that Mark referenced
> won't even be relevant. It first appeared in 9iR1. Didn't it?
>
> Cheers,
> Alex
>
>
> On Oct 31, 2007 8:46 AM, Crisler, Jon <Jon.Crisler_at_usi.com> wrote:
> > You don't say what platform you are on, but on Linux (specifically
> > Red
> Hat
> > 4) you could implement hugepages, which tends to alleviate problems
> with
> > very large SGA's and memory management.
> >
> > For a good starting point, see Metalink technote 361323.1
> >
> >
> >
> > ________________________________
> >
> >
> > From: oracle-l-bounce_at_freelists.org
> [mailto:oracle-l-bounce_at_freelists.org]
> > On Behalf Of Zanen van, J (Jacob)
> > Sent: Tuesday, October 30, 2007 11:01 PM
> > To: oracle-l_at_freelists.org
> > Subject: Performance Problem.
> >
> >
> >
> >
> >
> > Need some help on this.
> >
> >
> >
> > We have a peoplesoft environment here that seems to have performance
> issues
> > every so often.
> >
> > When this happens performance comes to s standstill for everyone and

> > a SQL*Plus query that takes about 6-7 seconds normally takes about
> > 13
> minutes.
> > During this time it will pretty much wait for sequential reads only.
> >
> > I have not yet had the chance to install statspack yet and we are
> > not licensed for AWR.
> > We are on oracle 10.2.0.3 on solaris 64bit
> >
> > When I check the database quickly (needs to be done quickly as
> business
> > dictates a reboot at the moment) I do not see any obvious reasons
> > why
> this
> > would be happening.
> >
> > On unix the admins see quite a large increase in use of semaphores.
> >
> > After the reboot the problems seem to disappear for several
> weeks/months
> >
> > Has anyone seen something similar before and found out why this is
> > happening.
> >
> >
> >
> > Brgds
> >
> >
> >
> > Jack
>
>
> --
> Alex Gorbachev, Oracle DBA Brewer, The Pythian Group
> http://www.pythian.com/blogs/author/alex http://www.oracloid.com BAAG
> party - www.BattleAgainstAnyGuess.com
>

--
Alex Gorbachev, Oracle DBA Brewer, The Pythian Group
http://www.pythian.com/blogs/author/alex http://www.oracloid.com BAAG
party - www.BattleAgainstAnyGuess.com
--
http://www.freelists.org/webpage/oracle-l


______________________________________________________________

This email, including any attachments, may be confidential or privileged, and is sent for the personal attention of the intended recipient.  If you have received this email in error, please delete it immediately.  The views expressed are not necessarily those of the Rabobank Group.  The Group is not liable for the effects of any virus which may be contained in this email.

If this email contains marketing material and you do not wish to receive such material by email in future, please reply to this email and place the words "Remove My Details - Electronic Messages" in the Subject Header.

The Rabobank Group

Australia: 1800 025 484
New Zealand: 0800 500 933
______________________________________________________________
--
http://www.freelists.org/webpage/oracle-l

Received on Mon Nov 05 2007 - 21:28:28 CST