anon page allocation on solaris for shared servers went to 300MB out of nothing [long]
Date: Wed, 11 Jun 2014 21:16:21 +0200
Message-ID: <5398AB05.9080009_at_interia.pl>
Hi,
we've experienced strange server hang caused by out of memory errors on 128 GB machine with 90GB SGA .
It's solaris 11 with oracle EE11.2.0.3 + recent PSU . Doing some post mortem vmcore checking I've found this:
CAT(vmcore.0/11U)> mem
pages bytes physinstalled 16777216 137438953472 (128G) physmem 16489356 135080804352 (125G) total_pages 16489085 135078584320 (125G) freemem 64872 531431424 (506M) avefree 64872 531431424 (506M) avefree30 65057 532946944 (508M) needfree 75063 614916096 (586M) availrmem (nonswapable) 2935072 24044109824 (22.3G) availrmem_initial 16489085 135078584320 (125G) swapfs_minfree 2061169 16885096448 (15.7G) sw_pending_size 8192 (8K) lotsfree 257641 2110595072 (1.96G) desfree 128820 1055293440 (1006M) minfree 64410 527646720 (503M) throttlefree 64410 527646720 (503M) pp_kernel(calculated) 2039836 16710336512 (15.5G) pages_locked 2721 22290432 (21.2M) shared memory (SM) 2870632 (2.73M) intimate SM (ISM) 96636780544 (90G) dynamic ISM (DISM) 0 (0) locked DISM 0 0 (0) total locked SM 96636780544 (90G) (70.31% of memory) spt_used (ISM) 11796482 96636780544 (90G) segspt_minfree 809107 6628204544 (6.17G)
WARNING: soft swapping (avefree < desfree && freemem <= desfree)
k_anoninfo: (physical == disk-backed)
ani_phys_max - disk swap 17039359 pages
(129G)
ani_phys_avail - available disk 8443024 pages
(64.4G)
ani_asleep_mem_resv - reserved asleep memory 0 pages (0) ani_mem_resv - reserved memory 0 pages (0) ani_mem_locked - locked memory 11796482 pages
(90G)
ani_free - unallocated physical and memory 8541727 pages
(65.1G)
initial virtual swap available for reservation 31467275 pages
(240G)
ani_max + MAX(availrmem_initial - swapfs_minfree, 0)
current virtual swap available for reservation 9316927 pages
(71G)
ani_phys_avail + Asleep_availrmem + MAX(availrmem - swapfs_minfree, 0)
CAT(vmcore.0/11U)> proc -r -s size
addr PID PPID RUID/UID size RSS swresv
lwpcnt command
============== ====== ====== ========== ========== ======== ========
====== =====
x6401242c9000 568 1 100 96886915072 4456448 6840320
1 ora_s083_sid
0x640208970ff8 8314 1 100 96886923264 4538368 6840320 1 ora_diag_sid 0x640143146050 657 1 100 96887463936 4431872 7127040 1 ora_s115_sid
---------above looks ok in terms of RSS , but check this out
0x6401b0bf9000 8471 1 100 97329143808 297426944 368320512 258 ora_s034_sid
0x64015b6e8050 534 1 100 97439129600 301449216 552239104 1 ora_s070_sid 0x640119acb018 369 1 100 97455792128 301031424 565493760 1 ora_s060_sid 0x6402742f4040 27109 1 100 97455898624 295297024 574578688 1 ora_s039_sid 0x64027e1cc020 659 1 100 97455923200 299892736 568311808 1 ora_s116_sid 0x640297349000 212 1 100 97457045504 298565632 564764672 7 ora_s051_sid 0x640164be8000 8407 1 100 97463615488 300081152 538255360 258 ora_s002_sid 0x6402045a9028 610 1 100 97472503808 299917312 589152256 1 ora_s102_sid 0x640133fb0048 552 1 100 97472544768 299761664 589258752 1 ora_s075_sid 0x6401ef3be020 384 1 100 97472675840 298442752 584015872 1 ora_s066_sid 0x640206514008 226 1 100 97472684032 296189952 588218368 1 ora_s058_sid 0x64028bb3d000 578 1 100 97472684032 301342720 587759616 1 ora_s088_sid 0x6401632e8010 378 1 100 97472692224 299614208 584294400 1 ora_s063_sid 0x64023294cfe0 574 1 100 97472692224 297820160 586129408 1 ora_s086_sid
the RSS is about 280-300M in size , looks strange for me like for an oracle server process .
going further
CAT(vmcore.0/11U)> mem -l user
PID size RSS swrsv anon swap file command
665 91.9G 284M 1.65G 280M 1.37G 196M ora_s119_sid 663 90.7G 308M 579M 303M 273M 196M ora_s118_sid 659 90.7G 286M 541M 280M 259M 196M ora_s116_sid 657 90.2G 4.22M 6.79M 24K 3.95M 196M ora_s115_sid 629 91.8G 282M 1.56G 277M 1.29G 196M ora_s111_sid 627 90.7G 286M 568M 281M 284M 196M ora_s110_sid 625 90.2G 6.64M 7.56M 736K 4.09M 196M ora_s109_sid 623 91.7G 279M 1.51G 275M 1.24G 196M ora_s108_sid 618 91.9G 286M 1.72G 282M 1.44G 196M ora_s106_sid 616 90.7G 285M 574M 280M 290M 196M ora_s105_sid 612 90.7G 291M 574M 285M 286M 196M ora_s103_sid 610 90.7G 286M 561M 280M 278M 196M ora_s102_sid 608 91.9G 283M 1.67G 279M 1.39G 196M ora_s101_sid 606 90.7G 290M 575M 285M 288M 196M ora_s100_sid 602 90.2G 4.22M 6.90M 24K 3.83M 196M ora_s098_sid 598 90.7G 285M 567M 280M 285M 196M ora_s096_sid 596 90.7G 286M 566M 280M 283M 196M ora_s095_sid 594 90.7G 287M 553M 282M 268M 196M ora_s094_sid 584 90.2G 6.39M 7.66M 464K 4.50M 196M ora_s091_sid 582 90.2G 4.23M 7.16M 24K 4.37M 196M ora_s090_sid 580 90.7G 286M 557M 281M 272M 196M ora_s089_sid 578 90.7G 287M 560M 281M 276M 196M ora_s088_sid 574 90.7G 284M 558M 278M 277M 196M ora_s086_sid 572 91.8G 284M 1.63G 279M 1.36G 196M ora_s085_sid 570 90.7G 285M 567M 279M 285M 196M ora_s084_sid 568 90.2G 4.25M 6.52M 24K 3.63M 196M ora_s083_sid 566 90.7G 286M 569M 280M 286M 196M ora_s082_sid 562 90.8G 287M 614M 281M 329M 196M ora_s080_sid 560 91.8G 285M 1.63G 281M 1.35G 196M ora_s079_sid 554 91.8G 284M 1.63G 280M 1.35G 196M ora_s076_sid 552 90.7G 285M 561M 280M 279M 196M ora_s075_sid 542 90.7G 291M 576M 286M 287M 196M ora_s074_sid 538 90.2G 6.25M 7.91M 496K 4.67M 196M ora_s072_sid 534 90.7G 287M 526M 281M 242M 196M ora_s070_sid 530 91.9G 287M 1.72G 282M 1.44G 196M ora_s068_sid 386 90.7G 285M 567M 279M 285M 196M ora_s067_sid 384 90.7G 284M 556M 278M 275M 196M ora_s066_sid 382 90.7G 288M 566M 283M 280M 196M ora_s065_sid 380 91.8G 284M 1.63G 280M 1.35G 196M ora_s064_sid 378 90.7G 285M 557M 280M 273M 196M ora_s063_sid 373 90.2G 4.25M 6.91M 24K 4.02M 196M ora_s062_sid 369 90.7G 287M 539M 281M 255M 196M ora_s060_sid 367 90.7G 287M 569M 281M 285M 196M ora_s059_sid 312 91.9G 288M 1.65G 284M 1.37G 196M ora_s048_sid 304 91.8G 284M 1.56G 279M 1.29G 196M ora_s047_sid 302 90.2G 4.54M 7.91M 24K 5.10M 196M ora_s046_sid 292 91.9G 288M 1.67G 284M 1.39G 196M ora_s045_sid 226 90.7G 282M 560M 276M 281M 196M ora_s058_sid 224 90.7G 286M 571M 280M 288M 196M ora_s057_sid 220 90.8G 286M 640M 281M 356M 196M ora_s055_sid 216 90.2G 4.64M 7.91M 24K 5.14M 196M ora_s053_sid 214 90.2G 4.25M 7.34M 24K 4.59M 196M ora_s052_sid 212 90.7G 284M 538M 278M 257M 196M ora_s051_sid
did some math and it was like 79 shared servers with about 280MB anon
memory size .
Questions:
Does Anyone have an idea about what could casue such shared server
anon/private memory utilization, is it normal at all ?
Curently anon page size for shared server process (pmax -x PID) is like
4-7MB there is only one shared server where pmap -x PID reports 300MB
anon space usage ,
interestingly Oracle v$sesstat claims that process allocated pga/uga
memory is 20MB only .
Any ideas how I can drill down and find out about allocations in shared server process memory ?
btw
Oracle recommended decreasing SGA :) .
Regards
GG
-- http://www.freelists.org/webpage/oracle-lReceived on Wed Jun 11 2014 - 21:16:21 CEST