RE: Oracle & pagefaults (solaris)

From: Patterson, Joel <Joel.Patterson_at_crowley.com>
Date: Mon, 13 Aug 2012 10:44:58 -0400
Message-ID: <C95D75DD2E01DD4D81124D104D317ACA1C7517B8DB_at_JAXMSG01.crowley.com>



To throw some suggestions at the issue while you are waiting for some responses: Are you using ZFS? If so, have you looked at any of the following? ZFS for Databases:
http://www.solarisinternals.com/wiki/index.php/ZFS_for_Databases Limitations -- Full table scans, etc.
Variable performance of sequential reads - Since ZFS does block-level copy-on-write for all writes, the physical layout of a file will tend to become random over time for files that are updated randomly. Normal RDBMS operations such as the updating of tables and indexes characteristically result in such random writes. As a consequence, the performance of database queries that do sustained large sequential reads (including "full table scans", "full index scans", or CREATE INDEX operations) can deteriorate over time as the database is updated. Simple ad-hoc or utility queries such as SELECT COUNT (*) from a non-indexed table commonly cause such sustained sequential I/O demand. The worst case impact varies inversely with the ZFS record size, thus posing a tradeoff between the performance of random reads and the performance of sequential reads in relation to the degree of potential fragmentation.

http://developers.sun.com/solaris/docs/wp-oraclezfsconfig-0510_ds_ac2.pdf Backups:
Copy operations, like backups, are similar to a full scan, long logical sequential reads and are also subject to IOPS inflation as compared to traditional file system storage. The elapsed time for a database backup can be due to the copy-on-write evolution of the on-disk format for data files. Copying the data files, although cumbersome, might help to relocate the blocks into a more continuous physical layout, particularly if the pool is left with a minimum of free disk blocks.

This Guy, Constantin Gonzalez like ZFS and supports the defaults for most situations: http://constantin.glez.de/blog/2010/04/ten-ways-easily-improve-oracle-solaris-zfs-filesystem-performance

Evil Tuning Guide:
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

Have you changed any of your SAN hardware or software; any other changes?

Joel Patterson
Database Administrator
904 727-2546

-----Original Message-----
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of przemolicc_at_poczta.fm Sent: Monday, August 13, 2012 10:27 AM
To: oracle-l_at_freelists.org
Subject: Oracle & pagefaults (solaris)

Hi all,

we have been facing performance problems on our Solaris 10 Sparc based server with several Oracle databases running on it. It started 2 months ago and gradually is increasing. There is always high level of syscalls when we face these problems:

09:20:03      25       8       0      68
09:30:02      22       7       0      70
09:40:03      23       8       0      70
09:50:03      23       7       0      69
10:00:03      22       9       0      69
10:10:03      25      34       0      41
10:20:03      28      28       0      44
10:30:03      27      24       0      49
10:40:02      24       8       0      68
10:50:03      27      19       0      54
11:00:02      31      27       0      41
11:10:21      29      48       0      23
11:20:03      36      56       0       9
11:30:05      32      51       0      18
11:40:03      37      43       0      20
11:50:06      28      59       0      14
12:00:06      32      61       0       7
12:10:03      38      26       0      36
12:20:02      34       9       0      57
12:30:02      35       8       0      58
12:40:03      35       9       0      56

Normal level of syscalls is about 7-10%. But when something is happening it increases to over 50%. Using DTrace I can find that kernel is working in 'pagefault' function:

......

              unix`page_freelist_coalesce+0x894 [***]
              0x1fe0000
              unix`page_get_mnode_freelist+0x398
              unix`page_get_freelist+0x428
              unix`page_alloc_pages+0x110
              genunix`anon_map_getpages+0x348
              genunix`segvn_fault_anonpages+0x32c
              genunix`segvn_fault+0x530
              genunix`as_fault+0x4c8
              unix`pagefault+0x68
              unix`trap+0xd50
              unix`utl0+0x4c
            29062

              unix`cpu_halt+0x10c
              unix`cpu_halt+0x104
              unix`idle+0x128
              unix`thread_start+0x4
           719133

(this is just part of the image - there are much more pagefaults then listed above). So natural is to check on behalf of which application kernel is working in pagefaults:

[ High syscalls ]

CPU     ID                    FUNCTION:NAME
 96  85656                       :tick-5sec
  nscd                                                              1
  top                                                               1
  dtrace                                                            5
  perl                                                            122
  emagent                                                         124
  tnslsnr                                                        1569
  oracle                                                         5538

[ Low (normal) syscalls ]

CPU     ID                    FUNCTION:NAME
 20  85656                       :tick-5sec
  nscd                                                              1
  dtrace                                                            5
  emagent                                                          41
  sh                                                               91
  perl                                                             93
  emdctl                                                          243
  tnslsnr                                                         320
  oracle                                                         2466


When I watch which Oracle's PID are generating these high syscalls using 'ps' most of them don't exists (!). Are they short lived processes ? Our DBA claims that this is not a problem of Oracle but DTrace shows that system is working on behalf of Oracle. Can you shed some light where our DBAs should look for any solution ? When I analyzed number of connections in listener log they don't differ much between low and high system calls.

Can you help me please ? I don't feel comfortable because my feeling is that I am doing DBAs homework but they claim that AWR report is the same as before so there is nothing they could do.

Best regards
Przemek

--
http://www.freelists.org/webpage/oracle-l


--
http://www.freelists.org/webpage/oracle-l
Received on Mon Aug 13 2012 - 09:44:58 CDT

Original text of this message