Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: A Challenge - My Answer

RE: A Challenge - My Answer

From: Post, Ethan <Ethan.Post_at_ps.net>
Date: Mon, 14 Nov 2005 16:29:17 -0600
Message-ID: <1F989681BA05FA4CAD9FA849ED8520570149BB9E@pscdalpexch01.perotsystems.net>


Thanks for your answers.  

Some of you are using software for your solution, this is nice when it is available but there is the time to install, configure, and deploy to multiple servers/environments (for me anyway). Others have written scripts in the past but have moved to other tools. Others had some scripts they were currently using.  

My script is below. I actually took my load monitoring script, decided it was pretty ugly and thought that I should just create a script that monitors numbers and it would be a bit more useful for other things.  

Some ways I could use the script below...(the load average example is in the script).  

# if >= 200 oracle processes for > 30 minutes then alert and sleep 60 minutes
watchnum.ksh -t 30 -s 60 smon $(ps -ef | grep oracle | wc -l) 200 epost1_at_yahoo.com  

# alert if a process has accumulated more than N cpu time and does not go away after N time.
should be easy, sorry no example  

you get the idea, I am also going to port below to PL/SQL as well as make a few enhancements.  



BEGIN KSH SCRIPT

 

#!/usr/bin/ksh  

typeset -i SLEEP_UNTIL_TIME OVER_THRESHOLD_TIME MIN_SLEEP_TIME CURRENT_TIME BEGIN_TIME
typeset -u WATCH_ID  

MAX_ALLOWED_TIME=0
MIN_SLEEP_TIME=0
SEND_OK_ON_RESET=N
LOG_FILE=/tmp/watchnum.log
TMP_DIRECTORY=/tmp
HEADLINE="watchlog.ksh"  

function uhoh {
if (( ${1} )); then

   echo "uhoh: ${2}"
   exit 1
fi
}  

function current_minutes {

# This very UGLY function calculates the # of minutes since the year
2000.

   MIN_YEAR=$( date +"%Y" )
   MIN_YEAR=$( expr ${MIN_YEAR} - 2000 )
   MIN_YEAR=$( expr ${MIN_YEAR} \* 525600 )
   MIN_DAYS=$( date +"%j" )
   MIN_DAYS=$( expr "${MIN_DAYS}" - 1 )
   MIN_DAYS=$( expr "${MIN_DAYS}" \* 1440 )
   MIN_HOURS=$( date +"%H" )
   MIN_HOURS=$( expr "${MIN_HOURS}" \* 60 )
   MIN_MINS=$( date +"%M" )
   MIN_TOTAL=$(( ${MIN_YEAR} + ${MIN_DAYS} + ${MIN_HOURS} + ${MIN_MINS}
))

   echo ${MIN_TOTAL}
}  

CURRENT_TIME=$(current_minutes)  

while getopts :t:s:a:l:oh: options
do

   case $options in

      t) MAX_ALLOWED_TIME=${OPTARG} ;;
      s) MIN_SLEEP_TIME=${OPTARG} ;;
      l) LOG_FILE="${OPTARG}" ;;
      o) SEND_OK_ON_RESET=Y ;;
      h) HEADLINE="${OPTARG}" ;;
     \?) print ${OPTARG} is not a valid argument. ;;
   esac
done  

shift $(expr $OPTIND - 1)  

usage() {
cat <<USAGE  

Script:
watchnum.ksh  

Options:  

-o Sends an everything is OK message when the monitored value

   falls below the defined threshold.
-t Sets MAX_ALLOWED_TIME. The number of minutes the monitored

   value is allowed to exceed the threshold before triggering an alert. -s Sets MIN_SLEEP_TIME. The number of minutes to ignore alerts

   for after an alert has been triggered. This helps cut down the    number of emails and pages when you already know there is a problem. -h Sets HEADLINE. This is the string that will appear in the subject

   of the email of page.
-l Sets LOG_FILE. This defaults to /tmp/watchnum.ksh unless specified.  

Parameters (1-3 are required):  

\$1 WATCH_ID - User specified ID for this alert, no spaces no silly

    characters.

\$2 CURRENT_VALUE - Current value of the number related to this alert.
\$3 THRESHOLD - The threshold that will trigger the alert.
\$4 EMAILS/PAGERS - List of emails with commas between them.
 

Examples:  

# If server load average is over 8 for 2 hours send email. watchnum.ksh -o -t 120 -s 180 -h "Server Load Warning" \\

   -l /home/oracle/log/watchnum.log loadavg \$(uptime | awk '{ print substr($(NF-2),1,4) }') \\

   8 epost1_at_yahoo.com  

USAGE
exit 1
}  

if (( $# == 0 )); then

   usage;
fi  

# Exit if these parameters are not supplied. [[ -z "${1}" || -z ${2} || -z ${3} ]] && usage  

WATCH_ID=${1}
CURRENT_NUMBER=${2}
THRESHOLD=${3}
EMAILS="${4}"
ALERT_OR_OK=   if [[ -n ${LOG_FILE} ]]; then

   touch ${LOG_FILE} || uhoh $? "Cannot create ${LOG_FILE}." fi  

TINY="${TMP_DIRECTORY}/watchval_${WATCH_ID}.dat" [[ -f "${TINY}" ]] || echo "${WATCH_ID}:0:0" > ${TINY} || uhoh $? "Could not create ${TINY}."  

SLEEP_UNTIL_TIME=$(cat ${TINY} | awk -F":" '{ print $2}') BEGIN_TIME=$(cat ${TINY} | awk -F":" '{ print $3}')  

if (( ${CURRENT_NUMBER} >= ${THRESHOLD} )); then  

# When over threshold and begin is still zero, then this is first
time over

# the threshold and we will set begin to current time.
   if (( ${BEGIN_TIME} == 0 )); then

      BEGIN_TIME=${CURRENT_TIME}
      echo "${WATCH_ID}:${SLEEP_UNTIL_TIME}:${BEGIN_TIME}" > ${TINY}
   fi  

# If we are not currently in a sleep cycle.
   if (( ${CURRENT_TIME} >= ${SLEEP_UNTIL_TIME} )); then

      # Get the # of minutes we have been over threshold.
      OVER_THRESHOLD_TIME=$( echo "${CURRENT_TIME} - ${BEGIN_TIME}" | bc
-l )
      # If # of minutes is more than allowed trigger alert.
      if (( ${OVER_THRESHOLD_TIME} >= ${MAX_ALLOWED_TIME} )); then
         # We will sleep until stated, this will require an update to
the record.
         SLEEP_UNTIL_TIME=$( echo "${CURRENT_TIME} + ${MIN_SLEEP_TIME}"
| bc -l)
         echo "${WATCH_ID}:${SLEEP_UNTIL_TIME}:${BEGIN_TIME}" > ${TINY}
         ALERT_OR_OK="ALERT"
      fi

   fi
else

# If we fall under threshold reset the entire record.

   echo "${WATCH_ID}:0:0" > ${TINY}
   if (( ${BEGIN_TIME} > 0 )); then
      [[ "${SEND_OK_ON_RESET}" = "Y" ]] && ALERT_OR_OK="OK"
   fi
fi  

echo "$(hostname)|${WATCH_ID}|$(date +"%m/%d/%Y %H:%M")|${CURRENT_NUMBER}|${SEND_OR_OK}" >> ${LOG_FILE}   if [[ -n "${ALERT_OR_OK}" ]]; then

   for EMAIL_ADDRESS in ${EMAILS}; do

      echo "${ALERT_OR_OK} ${WATCH_ID}=${CURRENT_NUMBER}, host=$(hostname)" | mailx -s "${HEADLINE}" "${EMAIL_ADDRESS}"

   done
fi  

exit 0

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Nov 14 2005 - 16:34:02 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US