Solaris T5220 server problem

From: Wolfson Larry - lwolfs <lawrence.wolfson_at_acxiom.com>
Date: Thu, 28 Apr 2011 00:15:15 +0000
Message-ID: <EDA437CAA8612C418E013CDA4B4A755137F21F3B_at_CWYIGMBCRP01.Corp.Acxiom.net>



Hello!

            Finally convinced client long running code wasn't database, application, network problem.

Noticed when I was running one of my queries, that usually runs in a tenth of a second elapsed time, was taking about 8 seconds on production server 8G, 32 CPUs with both 10.2.0.4 prod & test (separate ORACLE_HOMES) on same server.

Wanted Unix admin to run some type of Dtrace. I had already run truss a number of times. Didn't get that, but SA found echo was running about 30-60 times longer on this server than dozens of others we manage (most not T5220s). They ran GUDS, which didn't help and then support person came up with this from a buddy he reached out to.

He suggested turning page coalescing off, which we found to be beneficial in many performance escalations. This is something you can do on the fly and if it's found to have a desirable effect, it can be permanently set in /etc/system. There are no know downsides to doing this in the real world.

Once this is enabled, could your DBA's run some test jobs which can be compared against timings for the same jobs when the test DB is down?

Here are the dirty details from previous communications on the topic:

quote --->

Large pages are not a problem. It is finding or coalescing them when none is available needs improvment. LPOOB feature is designed to improve application out of box performance. There are number of LPOOB fixes already been integrated in Sol10 U4 and more are planned for U5 and U6.

It is wiser to disable coalescing than disable LPOOB. If you don't want page coalescing then set following tunables dynamically or in /etc/system file.

And
What I didn't mention before is that the page coalescing issue is specifically mentioned with the Niagara family of CPUs, which is what this T5220, is running on systems running Java applications and Oracle databases (the Oracle part being pertinent here.) Still not saying that it's definitively going to resolve the problems, but it's worth trying based on the system type, Oracle, and symptoms.

This is dynamic change. Support person says we can easily toggle this back with no service interruption Client is not buying that and I was just wondering what experience anyone else has had with T5220s?

Support said they did this mostly for SAP and while we run a number of SAPs, not on this server which I would categorize as relatively lightly loaded. Prod is far busier during nightly batch window. Scheduled stats run well prior to that for 3-13 minutes.

Server and database have been up close to 2 years and they just noticed these processes running longer about 6 weeks ago. They put a new release in TEST but claim problem started just prior to that. Not refuting that.

Thanks for any ideas, suggestions, experiences.

  Larry



The information contained in this communication is confidential, is intended only for the use of the recipient named above, and may be legally privileged.

If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.

If you have received this communication in error, please resend this communication to the sender and delete the original message or any copy of it from your computer system.

Thank You.


--
http://www.freelists.org/webpage/oracle-l
Received on Wed Apr 27 2011 - 19:15:15 CDT

Original text of this message