Load on Server (AIX P9)

From: manikandan <pvmanikandan_at_gmail.com>
Date: Mon, 9 Sep 2019 22:42:55 -0400
Message-ID: <CAB6Jwgj3RDN8XQT_=eU+aQtgpP8Db=6nZLX_9bP9Yu7yP8oDRA_at_mail.gmail.com>



Hi,

Tivoli has been configured on the database servers to restart the monitoring scripts when it goes down which includes OSWatcher as well. When Tivoli restarts OSWatcher script, we have observed that the commands such as ps -elk ( from topaix.sh) and ps -ae -o user,pid,ppid,pri,pcpu,pmem,vsz,rssize,wchan,stat,etime,time,args ( from psmemswsub.sh) are getting hung and spawning many processes and eventually server becomes unresponsive with heavy load resulted from these processes. What we had seen normally is that the ppid of the above commands are the pid of topaix.sh and psmemswsub.sh respectively, but in this case the ppid of these commands is 1.

The below from Monitoring team :-

The Tivoli agent (k08agent) runs as root and executes a script to check for processes running. If it finds a process down, it attempts to restart it using the command provided by the support/app team. In this case:

su - oracle -c "/users/oracle/local/prod/sh/start_osw_generic.ksh"

This command is put into the RESTART_PROCESS variable, and executed in the following way.

eval "$RESTART_PROCESS &" > /dev/null 2>&1

$ cat start_osw_generic.ksh

cd /ora45/dbworkspace/OSWATCHER/oswbb

nohup /ora45/dbworkspace/OSWATCHER/oswbb/startOSWbb.sh 30 120 &

Operating system :- AIX P9 64bit 7.2.

We have first tried with OSW version 7.3 and then tried with version 8 just to rule out any issue with OSW version.

Note :- When we run the command su - oracle -c "/users/oracle/local/prod/sh/start_osw_generic.ksh" after logging in as root user , we dint face any issue.

Please let me know if you have any suggestion on this.

Thanks,

Mani

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Sep 10 2019 - 04:42:55 CEST

Original text of this message