Re: Global Cache and Enqueue Services statistics

From: Gaja Krishna Vaidyanatha <gajav_at_yahoo.com>
Date: Tue, 31 Jul 2012 09:14:34 -0700 (PDT)
Message-ID: <1343751274.71113.YahooMailNeo_at_web83606.mail.sp1.yahoo.com>



Hi Amir,
From the performance data you have shared, it is evident that RAC's consumption is approximately 30% of Elapsed Time for your workload. In the case of your application, that is the overhead that RAC poses in a 3-node cluster configuration. The row-level locking issue can (and will) exacerbate the elapsed times of everything that RAC has to do in the form of "inter-instance communication". I would definitely focus on determining why 14% of DB Time is spent on locking. The application is incurring "lock waits" that is averaging 1.3 seconds per occurrence. In the realm of a normal transactional locking application, that is an eternity. 

And to Tim Gorman's point, the data does basically show to you, what life is going to be in a multi-instance-shared-database architecture environment. This could be one of those cases, where an application that performs reasonably well in a single-instance configuration, will fall-apart in a multi-instance configuration WITHOUT additional work to deal with the "intricacies of RAC". I would definitely give Tim's suggestion of setting CLUSTER_DATABASE from one of the instances and re-running your tests, a try.  Also, I am not convinced at this time whether "Freelists Groups" set at 1 is the cause of your performance bane. I personally would rather focus on solving the locking issue first, re-running your tests again and then go down the path of Freelist Groups manipulation (if the performance data deems it to be necessary). It may be the case with yours, that you will have to deal with "the perception of HA" in a completely different fashion. Do keep us  posted.

Cheers,

Gaja
 
Gaja Krishna Vaidyanatha,
CEO & Founder, DBPerfMan LLC
http://www.dbperfman.com
http://www.dbcloudman.com

Phone - +1-650-743-6060
LinkedIn - http://www.linkedin.com/in/gajakrishnavaidyanatha

Co-author: Oracle Insights:Tales of the Oak Table - http://www.apress.com/9781590593875 Primary Author: Oracle Performance Tuning 101 - http://www.amzn.com/0072131454 Enabling Cloud Deployment & Management for Oracle Databases



 From: "Hameed, Amir" <Amir.Hameed_at_xerox.com> To: gajav_at_yahoo.com; oracle-l_at_freelists.org Sent: Monday, July 30, 2012 1:47 PM
Subject: RE: Global Cache and Enqueue Services statistics  

Hi Gaja,
Below are the top-5 wait events which are the same on all nodes:

Top 5 Timed Foreground Events



                                                          Avg                                                           wait  % DB Event                                Waits    Time(s)  (ms)  time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
db file sequential read          54,365,842      34,374      1   32.3 User I/O
gc buffer busy acquire              461,651      18,904     41   17.8 Cluster
enq: TX - row lock contention        11,506      15,269   1327   14.4 Applicatio
DB CPU                                          11,476          10.8 gc current block busy              255,945      10,747    42  10.1 Cluster

I am also investigating to see if the test was run the way it should have been because of the ' enq: TX - row lock contention' event. I have also identified statements that were suffering from the 'gc' waits shown above. The underlying segments of those statements have 'freelist groups' defined as '1'. This is an EBS system which has been around for a long time. It was upgraded from 11.0.3 to 11i several years ago and that is most likely why 'freelist groups' of most of the standard segments is '1'.

Thanks,
Amir
-----Original Message-----
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Gaja Krishna Vaidyanatha Sent: Monday, July 30, 2012 4:34 PM
To: oracle-l_at_freelists.org
Subject: Re: Global Cache and Enqueue Services statistics

Hi Amir,
What are your "Top 5 Wait Events" in the AWR during your load tests? The GC and ES statistics themselves don't mean much (except that you have very high Avg. Message Sent Queue Time) unless we know what is the source of your DB's pain point. Let's start our discussion with the "Top 5 Waits" and then then the Top SQL that relates to it.

Cheers,

Gaja
 
Gaja Krishna Vaidyanatha,
CEO & Founder, DBPerfMan LLC
http://www.dbperfman.com
http://www.dbcloudman.com

Phone - +1-650-743-6060
LinkedIn - http://www.linkedin.com/in/gajakrishnavaidyanatha

Co-author: Oracle Insights:Tales of the Oak Table - http://www.apress.com/9781590593875 Primary Author: Oracle Performance Tuning 101 - http://www.amzn.com/0072131454 Enabling Cloud Deployment & Management for Oracle Databases



From: "Hameed, Amir" <Amir.Hameed_at_xerox.com> To: oracle-l_at_freelists.org
Sent: Monday, July 30, 2012 1:15 PM
Subject: Global Cache and Enqueue Services statistics

Folks,
I have a three-node Oracle RAC environment running on Solaris 10. The Grid and DB versions are 11.2.0.3 and 11.1.0.7 respectively. We ran a load test against the environment to simulate our load in production. The transaction timings were off when compared to timings from the single instance of the same environment. When I look at AWR from all instances, the following workload statistics seem a bit high:

Avg message sent queue time

Avg global cache current block receive time (ms)

Avg global cache cr block flush time

The CPU utilization was over 90% idle on each RAC node during the test. The interconnect is an aggregated link of two 10GbE NIC. Database files are on RAID-5 SSDs where as redo logs are on dedicated RAID-10 SAS drives. Is there anything that I should look at closely that could help identify reason for higher timings for these statistics? Also, what is considered as good timing for these statistics?

Thank you,

Amir

Instance #1


Global Cache and Enqueue Services - Workload Characteristics


                     Avg global enqueue get time (ms):      1.4

          Avg global cache cr block receive time (ms):      3.7

     Avg global cache current block receive time (ms):     13.0

           Avg global cache cr block build time (ms):      0.0

             Avg global cache cr block send time (ms):      0.0

      Global cache log flushes for cr blocks served %:      8.0

            Avg global cache cr block flush time (ms):      7.5

         Avg global cache current block pin time (ms):      6.2

        Avg global cache current block send time (ms):      0.3

Global cache log flushes for current blocks served %:     13.4

       Avg global cache current block flush time (ms):      5.1

Global Cache and Enqueue Services - Messaging Statistics


                     Avg message sent queue time (ms):   7736.8

Instance #2


Global Cache and Enqueue Services - Workload Characteristics


                     Avg global enqueue get time (ms):      0.8

          Avg global cache cr block receive time (ms):      2.2

     Avg global cache current block receive time (ms):     11.2

            Avg global cache cr block build time (ms):      0.0

             Avg global cache cr block send time (ms):      0.0

      Global cache log flushes for cr blocks served %:      6.8

            Avg global cache cr block flush time (ms):     12.0

         Avg global cache current block pin time (ms):     10.5

        Avg global cache current block send time (ms):      0.3

Global cache log flushes for current blocks served %:     15.0

       Avg global cache current block flush time (ms):      6.4

Global Cache and Enqueue Services - Messaging Statistics


                     Avg message sent queue time (ms):   9120.8

Instance #3


Global Cache and Enqueue Services - Workload Characteristics


                     Avg global enqueue get time (ms):      0.6

          Avg global cache cr block receive time (ms):      2.9

     Avg global cache current block receive time (ms):     10.4

            Avg global cache cr block build time (ms):      0.0

             Avg global cache cr block send time (ms):      0.0

      Global cache log flushes for cr blocks served %:      7.1

            Avg global cache cr block flush time (ms):      9.6

         Avg global cache current block pin time (ms):     14.3

        Avg global cache current block send time (ms):      0.3

Global cache log flushes for current blocks served %:     14.5

       Avg global cache current block flush time (ms):      6.5

Global Cache and Enqueue Services - Messaging Statistics


                     Avg message sent queue time (ms):   8390.3

--
http://www.freelists.org/webpage/oracle-l
--
http://www.freelists.org/webpage/oracle-l


--
http://www.freelists.org/webpage/oracle-l
--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jul 31 2012 - 11:14:34 CDT

Original text of this message