Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: ipcs problems - possibly - in oracle 8?

Re: ipcs problems - possibly - in oracle 8?

From: David Fitzjarrell <oratune_at_aol.com>
Date: Tue, 16 Jan 2001 18:39:35 GMT
Message-ID: <9424h6$3ms$1@nnrp1.deja.com>

In article <979668449.24111.0.nnrp-10.c30bdde2_at_news.demon.co.uk>,   "andrew_webby at hotmail" <spam_at_no.thanks.com> wrote:
> Hi
>
> Can anyone give me a clue here?
>
> I've a couple of test databases recently upgraded to Oracle 8.1.6r2 on
> Solaris 2.7.
>
> The test users are complaining that sometimes they can get in,
 sometimes
> they can't (we've checked the usual suspects like listener, alert log
 etc,
> but no dice). As nothing was changing in between (neither client nor
> database), my suspicions fell on unix itself.
>
> About the only thing that looks out of the ordinary is shared memory
 ISM
> attaches which worryingly looks like this:
>
> (ipcs -i)
> T ID KEY MODE OWNER GROUP ISMATTCH
 NATTCH
> Shared Memory:
> m 0 0x500e0dd9 --rw-r--r-- root root 1
> m 1 0x790 --rw-rw-rw- root root 0
> m 2 0x9ae7520 --rw-r----- oracle dba 10
 11 -
> RBTEST
> m 3 0xbf67de8 --rw-r----- oracle dba 25
 25 -
> HOULIVE
> m 4 0x98a5f10 --rw-r----- oracle dba 37
 36 -
> RBLIVE
> m 5 0xb4e6bf8 --rw-r----- oracle dba 8
 8 -
> HOUTEST
> m 6 0x1503facc --rw-r----- oracle dba
 4,294,966,760 10 -
> RBDEV
> m 7 0xb45ab788 --rw-r----- oracle dba
 4,294,964,292 7 -
> HOUDEV
> m 8 0x280267 --rw-r--r-- root root 0
>
> (sorry, here's hoping you use Courier font... :-)
>
> As you'll see, the ism attch column is mental. Also, it's going down.
 I
> appreciate that this may be more of a solaris problem, so I've posted
 there
> as well. The man pages don't give too much detail on why this might
 occur
> (though the large number leads me to believe an overflow of an
 unsigned
> integer or similar has occured).
>
> Any ideas if this is something for concern and if anyone can point me
> towards something useful on ISMATTCH (I've tried Sun search, general
> web/usenet search etc), that would be top (no unix-pun intended). I'm
> certainly giving it a good go here, but our top unix man is off on
> compassionate leave and there's danger of a crowd forming around my
 desk...
>
> Andrew
>
>

From Metalink:

Center of Expertise Research Articles
Solaris ISM and Oracle, Frequently asked Questions

What is ISM?

ISM, or Intimate Shared Memory, is a way of handling the page table entries by the Sun Solaris operating system.

There is one memory structure in Sun Solaris that is used for keeping track of all the process page table information. This structure is essentially a series of hash chains, similar to Oracle's concept of LRU latch chains or cache buffers chains. During memory operations, this structure is traversed by the operating system as it makes decisions about how to handle memory. Each process on the system has information in this memory structure. There are only 256 entry points into this structure and this number cannot be increased.

Consequently, as more and more memory mappings and operations occur, the information stored behind each entry point grows (the "chains" lengthen). As the operating system responds to higher activity requiring memory mappings, it spends more and more time in kernel mode (shown as %sys in sar output) just walking around in this structure deciding what to do. Having large number of users and a large Oracle SGA further aggravates this situation.

The use of ISM reduces this problem significantly because it allows the processes to share the page table entries. Essentially, complex operations on these memory chains reduce to pointer operations involving small amounts of data.

What are the issues related to ISM?

ISM reduces the number of system calls therefore improving the overall performance when there are large memory allocations and large number of users on the system. In theory, this should cause no adverse affects and increase the overall performance of the system. Unfortunately, due to various Solaris operating system bugs enabling ISM can cause Oracle data corruptions or crashes. The relevant Sun bug numbers are:

4244523: Data corruption in ISM shared memory segs with heavy load/multi-threaded apps
4255955: With enable_grp_ism=1 on E10000, 5.6 -15 KJP, oracle 7.3.4 crashes

The Sun base bug number is 4228856, which is not published.

In addition to the above Sun bugs there are many Oracle bugs and duplicate Sun bugs pointing to the same problem:

If ISM is enabled on Sun E10000 systems, Oracle data corruptions may occur. This is especially true when Domain Reconfiguration (DR) feature is also enabled.

There are three very important points one should be aware of:

  1. The problem is specific to Sun E10000 models.
  2. The problem is more prominent when DR is enabled
  3. These problems are fixed in Sun Kernel patch level 16, a.k.a. Sun OS 5.6 Patch ID# 105181-16.

What causes the corruption?

Sun E10000 models have a new processor model featuring a new type of CPU register. When ISM is turned on, the shared memory image coherency across processes becomes inconsistent under some circumstances. This is caused by the CPU cache getting out-of-sync with the on-disk data and the register not being flushed. The effect of this problem is reflected as corrupt Oracle blocks, where the block header does not match the tail. In almost all cases, the block header has a pattern that exists in all of the ORA-1578 trace files.

It is very important to remember that data block corruption can be caused by many other factors, such as hardware failure, logical volume manager bugs or Oracle bugs. Encountering a corruption on a Sun E10000 does not automatically imply it is caused by ISM, and should be reported to the appropriate channels for detailed analysis.

How is ISM enabled?

To make full use of ISM, it must be enabled both at the Solaris and Oracle level. The default behavior of Solaris 5.6 and Oracle is to enable the use ISM. However, due to the issues discussed above, support analysts used to recommend turning off ISM usage, by setting the following parameters:

In /etc/system:

shmsys:ism_off=1
shmsys:share_page_table=1

In init.ora

use_ism=false

Note that the above parameters will turn OFF ISM. To enable ISM, simply remove those lines from the configuration files. However, Oracle will not use ISM if the entire SGA cannot fit in a contiguous shared memory area and will not report any error messages. If the system memory is fragmented and a contiguous shared area cannot be allocated for the Oracle SGA, the system needs to be rebooted.

How much performance gain does ISM give?

The performance gain one can expect after enabling ISM depends largely on the utilization of the system, especially the number of users and amount of shared memory used. On busy Oracle systems with many concurrent users (>200) and large SGA (>1G), we have observed as much as 30% performance improvements. There are also cases where the system may seem like it's hung but the kernel CPU usage is so high that no other activity can take place. Simply enabling the ISM reduced the kernel CPU usage, eliminating the hanging situations.

In general, the less memory the system allocates for process page tables, the less the overhead. Unfortunately, there is no direct interface to see the memory allocated for this purpose, other than the "crash" utility. The cache name for process page tables is "sfmmu8_cache" and can be checked by running the "crash" utility as the "root" user as follows:

# crash
dumpfile = /dev/mem, namelist = /dev/ksyms, outfile = stdout
> kmastat

                            buf   buf   buf   memory    #allocations
cache name                 size avail total   in use    succeed fail
----------                ----- ----- ----- --------    ------- ----
...
sfmmu8_cache                232  8297 43925 10280960     124873    0
...
----------                ----- ----- ----- --------    ------- ----
permanent                     -     -     -   114688       1155    0
oversize                      -     -     - 16433152     381109    0
----------                ----- ----- ----- --------    ------- ----
Total                         -     -     - 122658816 1545645581    0

> quit

#

If ISM is enabled, this cache should not grow as more users are connected and stay at the same level after reaching a stable value. We have observed less than 100M of memory usage on very busy systems, with more than 300 users and an SGA size of 2G, as opposed to 1G of memory usage without ISM.

What is the bottom line?

With the introduction of Sun Kernel patch 16 (Sun OS 5.6 Patch ID# 105181-16), all known problems related to ISM has been fixed. We should encourage Oracle customers to make use of this facility as it has a significant performance impact and definitely worthwhile. Customers should make sure they have applied the latest Sun Kernel patch and remove the parameters disabling the use of ISM, if they were set to prevent corruption problems in the past.

Sun already published a note informing customers on how to make this change:

Infodoc ID: 20823
Synopsis : Information about hangs on E10k systems due to disabling of Intimate Shared Memory (ISM) in /etc/system or Oracle's init.ora file Date :7 Oct 1999

There was a kernel bug, 4244523, that required a temporary workaround which was to turn off ISM in /etc/system and in the database application, for example, Oracle's init.ora file. This was fixed in the Solaris 5.6 kernel update patch 105181-15.

Unfortunately some customers may forget to remove the modifications to /etc/system and init.ora after upgrading their kernel. ISM is enabled by default. The following should NOT be in /etc/system:

    shmsys:ism_off=1
    shmsys:share_page_table=1

In addition Oracle's init.ora should NOT have:

    use_ism=false

To turn off ISM can cause severe performance degradation and cause what appears to be a hung state.

Acknowledgements

Most of the information in this document is compiled from internal mailing lists, Sun Microsystem's Support Web Page (sunsolve.sun.com) and author's field experience at various customer sites.

I would also like to thank Vern E. Wagman for his case study on Solaris 2.6, Veritas 3.3.1 and ISM.

--
David Fitzjarrell
Oracle Certified DBA


Sent via Deja.com
http://www.deja.com/
Received on Tue Jan 16 2001 - 12:39:35 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US