anon page allocation on solaris for shared servers went to 300MB out of nothing [long]

From: GG <grzegorzof_at_interia.pl>
Date: Wed, 11 Jun 2014 21:16:21 +0200
Message-ID: <5398AB05.9080009_at_interia.pl>



Hi,
  we've experienced strange server hang caused by out of memory errors on 128 GB machine with 90GB SGA .
It's solaris 11 with oracle EE11.2.0.3 + recent PSU . Doing some post mortem vmcore checking I've found this:

CAT(vmcore.0/11U)> mem

                               pages        bytes
physinstalled              16777216 137438953472 (128G)
physmem                    16489356 135080804352 (125G)
total_pages                16489085 135078584320 (125G)

freemem                       64872    531431424 (506M)
avefree                       64872    531431424 (506M)
avefree30                     65057    532946944 (508M)
needfree                      75063    614916096 (586M)
availrmem (nonswapable)     2935072  24044109824 (22.3G)
availrmem_initial          16489085 135078584320 (125G)
swapfs_minfree              2061169  16885096448 (15.7G)
sw_pending_size                             8192 (8K)

lotsfree                     257641   2110595072 (1.96G)
desfree                      128820   1055293440 (1006M)
minfree                       64410    527646720 (503M)
throttlefree                  64410    527646720 (503M)

pp_kernel(calculated)       2039836  16710336512 (15.5G)
pages_locked                   2721     22290432 (21.2M)

shared memory (SM)                       2870632 (2.73M)
intimate SM (ISM)                    96636780544 (90G)
dynamic ISM (DISM)                             0 (0)
locked DISM                       0            0 (0)
total locked SM                      96636780544 (90G) (70.31% of memory)
spt_used (ISM)             11796482  96636780544 (90G)
segspt_minfree               809107   6628204544 (6.17G)

WARNING: soft swapping (avefree < desfree && freemem <= desfree)

k_anoninfo: (physical == disk-backed)

   ani_phys_max - disk swap                               17039359 pages 

(129G)
ani_phys_avail - available disk 8443024 pages
(64.4G)
ani_asleep_mem_resv - reserved asleep memory 0 pages (0) ani_mem_resv - reserved memory 0 pages (0) ani_mem_locked - locked memory 11796482 pages
(90G)
ani_free - unallocated physical and memory 8541727 pages

(65.1G)

initial virtual swap available for reservation 31467275 pages
(240G)

   ani_max + MAX(availrmem_initial - swapfs_minfree, 0) current virtual swap available for reservation 9316927 pages
(71G)

   ani_phys_avail + Asleep_availrmem + MAX(availrmem - swapfs_minfree, 0)

CAT(vmcore.0/11U)> proc -r -s size

      addr PID PPID RUID/UID size RSS swresv lwpcnt command
============== ====== ====== ========== ========== ======== ======== ====== =====
x6401242c9000 568 1 100 96886915072 4456448 6840320 1 ora_s083_sid

0x640208970ff8   8314      1        100 96886923264  4538368 
6840320      1 ora_diag_sid
0x640143146050    657      1        100 96887463936  4431872 
7127040      1 ora_s115_sid

---------above looks ok in terms of RSS , but check this out

0x6401b0bf9000 8471 1 100 97329143808 297426944 368320512 258 ora_s034_sid

0x64015b6e8050    534      1        100 97439129600 301449216 
552239104      1 ora_s070_sid
0x640119acb018    369      1        100 97455792128 301031424 
565493760      1 ora_s060_sid
0x6402742f4040  27109      1        100 97455898624 295297024 
574578688      1 ora_s039_sid
0x64027e1cc020    659      1        100 97455923200 299892736 
568311808      1 ora_s116_sid
0x640297349000    212      1        100 97457045504 298565632 
564764672      7 ora_s051_sid
0x640164be8000   8407      1        100 97463615488 300081152 
538255360    258 ora_s002_sid
0x6402045a9028    610      1        100 97472503808 299917312 
589152256      1 ora_s102_sid
0x640133fb0048    552      1        100 97472544768 299761664 
589258752      1 ora_s075_sid
0x6401ef3be020    384      1        100 97472675840 298442752 
584015872      1 ora_s066_sid
0x640206514008    226      1        100 97472684032 296189952 
588218368      1 ora_s058_sid
0x64028bb3d000    578      1        100 97472684032 301342720 
587759616      1 ora_s088_sid
0x6401632e8010    378      1        100 97472692224 299614208 
584294400      1 ora_s063_sid
0x64023294cfe0    574      1        100 97472692224 297820160 
586129408      1 ora_s086_sid

the RSS is about 280-300M in size , looks strange for me like for an oracle server process .

going further

CAT(vmcore.0/11U)> mem -l user

   PID size RSS swrsv anon swap file command

   665 91.9G  284M 1.65G   280M 1.37G  196M ora_s119_sid
   663 90.7G  308M  579M   303M  273M  196M ora_s118_sid
   659 90.7G  286M  541M   280M  259M  196M ora_s116_sid
   657 90.2G 4.22M 6.79M    24K 3.95M  196M ora_s115_sid
   629 91.8G  282M 1.56G   277M 1.29G  196M ora_s111_sid
   627 90.7G  286M  568M   281M  284M  196M ora_s110_sid
   625 90.2G 6.64M 7.56M   736K 4.09M  196M ora_s109_sid
   623 91.7G  279M 1.51G   275M 1.24G  196M ora_s108_sid
   618 91.9G  286M 1.72G   282M 1.44G  196M ora_s106_sid
   616 90.7G  285M  574M   280M  290M  196M ora_s105_sid
   612 90.7G  291M  574M   285M  286M  196M ora_s103_sid
   610 90.7G  286M  561M   280M  278M  196M ora_s102_sid
   608 91.9G  283M 1.67G   279M 1.39G  196M ora_s101_sid
   606 90.7G  290M  575M   285M  288M  196M ora_s100_sid
   602 90.2G 4.22M 6.90M    24K 3.83M  196M ora_s098_sid
   598 90.7G  285M  567M   280M  285M  196M ora_s096_sid
   596 90.7G  286M  566M   280M  283M  196M ora_s095_sid
   594 90.7G  287M  553M   282M  268M  196M ora_s094_sid
   584 90.2G 6.39M 7.66M   464K 4.50M  196M ora_s091_sid
   582 90.2G 4.23M 7.16M    24K 4.37M  196M ora_s090_sid
   580 90.7G  286M  557M   281M  272M  196M ora_s089_sid
   578 90.7G  287M  560M   281M  276M  196M ora_s088_sid
   574 90.7G  284M  558M   278M  277M  196M ora_s086_sid
   572 91.8G  284M 1.63G   279M 1.36G  196M ora_s085_sid
   570 90.7G  285M  567M   279M  285M  196M ora_s084_sid
   568 90.2G 4.25M 6.52M    24K 3.63M  196M ora_s083_sid
   566 90.7G  286M  569M   280M  286M  196M ora_s082_sid
   562 90.8G  287M  614M   281M  329M  196M ora_s080_sid
   560 91.8G  285M 1.63G   281M 1.35G  196M ora_s079_sid
   554 91.8G  284M 1.63G   280M 1.35G  196M ora_s076_sid
   552 90.7G  285M  561M   280M  279M  196M ora_s075_sid
   542 90.7G  291M  576M   286M  287M  196M ora_s074_sid
   538 90.2G 6.25M 7.91M   496K 4.67M  196M ora_s072_sid
   534 90.7G  287M  526M   281M  242M  196M ora_s070_sid
   530 91.9G  287M 1.72G   282M 1.44G  196M ora_s068_sid
   386 90.7G  285M  567M   279M  285M  196M ora_s067_sid
   384 90.7G  284M  556M   278M  275M  196M ora_s066_sid
   382 90.7G  288M  566M   283M  280M  196M ora_s065_sid
   380 91.8G  284M 1.63G   280M 1.35G  196M ora_s064_sid
   378 90.7G  285M  557M   280M  273M  196M ora_s063_sid
   373 90.2G 4.25M 6.91M    24K 4.02M  196M ora_s062_sid
   369 90.7G  287M  539M   281M  255M  196M ora_s060_sid
   367 90.7G  287M  569M   281M  285M  196M ora_s059_sid
   312 91.9G  288M 1.65G   284M 1.37G  196M ora_s048_sid
   304 91.8G  284M 1.56G   279M 1.29G  196M ora_s047_sid
   302 90.2G 4.54M 7.91M    24K 5.10M  196M ora_s046_sid
   292 91.9G  288M 1.67G   284M 1.39G  196M ora_s045_sid
   226 90.7G  282M  560M   276M  281M  196M ora_s058_sid
   224 90.7G  286M  571M   280M  288M  196M ora_s057_sid
   220 90.8G  286M  640M   281M  356M  196M ora_s055_sid
   216 90.2G 4.64M 7.91M    24K 5.14M  196M ora_s053_sid
   214 90.2G 4.25M 7.34M    24K 4.59M  196M ora_s052_sid
   212 90.7G  284M  538M   278M  257M  196M ora_s051_sid


did some math and it was like 79 shared servers with about 280MB anon memory size .
Questions:
Does Anyone have an idea about what could casue such shared server anon/private memory utilization, is it normal at all ?

Curently anon page size for shared server process (pmax -x PID) is like 4-7MB there is only one shared server where pmap -x PID reports 300MB anon space usage ,
interestingly Oracle v$sesstat claims that process allocated pga/uga memory is 20MB only .

Any ideas how I can drill down and find out about allocations in shared server process memory ?

btw
Oracle recommended decreasing SGA :) .

Regards
GG

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Jun 11 2014 - 21:16:21 CEST

Original text of this message