Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> job monitoring script

job monitoring script

From: Stephen Lee <Stephen.Lee_at_DTAG.Com>
Date: Mon, 28 Apr 2003 09:26:53 -0800
Message-ID: <F001.0058AA54.20030428092653@fatcity.com>

Here is the info on a job monitoring script. If you have suggested improvements, please suggest.

The script is written in Korn shell -- the REAL ksh, not the public domain ksh. You might need to change the first line of the script from ksh93 to ksh. A sample crontab entry looks like:

### If you change the number of times per hour, change the CRON_INTERVAL variable in the script.
7,37 * * * * /oracle/app/oracle/admin/scripts/job_mon/job_mon.ksh >> /oracle/app/oracle/admin/scripts/job_mon/debug 2>&1

IMPORTANT: The script has a CRON_INTERVAL variable that is tied to how many times per hour you run the script. If you change the number of times per hour you run the script, then change the CRON_INTERVAL variable. The variable is used by the script to determine when to perform some maintenance tasks and send out a daily summary. I suppose one improvement could be to have the script read the crontab and get that info for itself ... something to put on my list of things to do "when I get around to it."

Some config files are required, the names and format of which follows.

Name: Determined by the variable WHO_TO_PAGE in the script Function: email addresses of who to page Format:
lucius.rapp_at_jomama.com
billy.bob_at_tireiron.com
etc.

Name: Determined by the variable PASSWORD_FILE in the script Function: has passwords for the databases. Format:
SID1:system:joeblow1
SID2:system:hellsbells
etc.

Name: Determined by the variable VARFILE in the script. Function: This is general list of oracle variables used by a lot of scripts. The entry that this script looks for is a list of users to whom e-mail is sent. This is distinct from users who get paged by the script. MAIL:MAIL_TO=user1_at_domain.com,user2_at_domain.com,user3_at_domain.com

There are variables in the script that determine the directories where the script finds things and puts things. As the script is written, one key variable is SCRIPTDIR. This is gotten by the script from the command line; so take that into consideration when writing your command line. You might want to change this.

A listing of my SCRIPTDIR looks like:

-rw-r-----    1 oracle   dba           510 Apr 28 08:08 daily_summary
-rw-r--r--    1 oracle   dba         29023 Feb 28 03:39 debug
-r-x------    1 oracle   dba         36780 Feb 21  2002 job_mon.ksh
-rw-r-----    1 oracle   dba         53713 Apr 28 09:37 log_file
-rw-r-----    1 oracle   dba             0 Apr 28 09:37 mail_file
-rw-r-----    1 oracle   dba           213 Apr 28 09:37 nuisance_file
-rw-r-----    1 oracle   dba             0 Apr 28 09:37 page_file
drwxr-x---    2 oracle   dba          4096 Apr 28 09:37 soft_links
drwxr-x---    3 oracle   dba          4096 Apr 28 09:55 source
-rw-------    1 oracle   dba           117 Mar 19 08:37 who_to_page

Note that there is a directory called soft_links. This is determined by the variable LINKDIR and is used by the script to keep track of the current status of a SID; so you only get paged when there is a CHANGE in status rather than being constantly paged for the same thing. A partial list of the directory looks like:

lrwxrwxrwx    1 oracle   dba             2 Apr 24 15:07 BRTD2 -> OK
lrwxrwxrwx    1 oracle   dba             2 Apr 27 22:37 BRTP1 -> OK
lrwxrwxrwx    1 oracle   dba            10 Apr 27 22:37 BRTP2 -> JOB_BROKEN
lrwxrwxrwx    1 oracle   dba             2 Apr 12 17:37 BRTT1 -> OK
lrwxrwxrwx    1 oracle   dba             2 Apr 17 13:37 BRTT2 -> OK
lrwxrwxrwx    1 oracle   dba             2 Apr 12 19:07 DTNP -> OK
lrwxrwxrwx    1 oracle   dba            13 Apr 28 06:37 DTNT ->
CONNECT_ERROR Since I already have a script to monitor the status of databases, this script records database problems other than broken jobs with only a generic CONNECT_ERROR listing. CONNECT_ERROR messages are e-mailed to the list as determined by the MAIL_TO line shown above. The WHO_TO_PAGE people are only for broken job messages. There is nothing to prevent you from changing this, or making both lists the same.

There are some variables that are hard-coded in the script that you will need to set.

In the SET_VARIABLES function you have:

DEBUG_GUY: This is who gets e-mailed (we hope) when something breaks, and script can't e-mail anyone else.
PRIMARY_BOX: This is name of box on which this script is running. SECONDARY_BOX: If you have a fail-over box configured, this is the name of that box.

In the SET_ORACLE_VARIABLES function you have:

TNS_ADMIN: this is the Oracle TNS_ADMIN variable. TNSFILE: name of tnsnames.ora file that contains a list of all the SIDs you want this script to test.
ORATAB: the oratab file you want this script to use. This script will look for the highest version ORACLE_HOME in the oratab file and set ORACLE_HOME to that version.

If you have a fail-over box configured, keep in mind that the boxes need to have the ability to run "r" commands on the other box (i.e. a .rhosts file).

This script sends out two daily e-mails to the MAIL_TO list at a time determined by the MAIN_TIME variable in conjunction with the CRON_INTERVAL. One email is the daily summary. It looks like:

ABRT : CONNECT_ERROR
BRTP2 : JOB_BROKEN The other is a summary log of the previous day's events and look like:

That's all I can think of now. I have attached the script itself, and I have pasted the text below just in case the attachment gets filtered out. If you have improvements to suggest, please do! But keep in mind that this script is intended only to test the status of database jobs rather than be a grand, do-all, heavyweight database monitoring script.

## This thing was written by Stephen Lee.

## NOTE: On Linux, the built-in ksh is crap.
##       This script MUST be run with the REAL ksh on Linux.
##       On other Unix, the factory equipped ksh is OK.

export
PATH='/usr/bin:/usr/sbin:/bin:/sbin:/opt/bin:/usr/ccs/bin:/usr/local/bin' DEBUG_GUY='lester.bestertester_at_microhard.com'

##################### BEGIN SET_VARIABLES #####################
SET_VARIABLES() {
	## Set the following to the hostname of primary and secondary boxes
	## depending on the role of that computer.  This is case sensitive.
	## The hostname must match what is returned by hostname command.
	## Leave blank or comment out if there is no secondary.
	PRIMARY_BOX='box1'
	SECONDARY_BOX='box2'


	SUPPORT='NOBODY'
	VARFILE='/oracle/app/oracle/ORACLE_VARIABLES'
	SCRIPTNAME=`basename $0`
	MY_TAG=`echo $SCRIPTNAME | sed 's/\..*//g' | tr "[a-z]" "[A-Z]"`

	if [ ! -f "$VARFILE" -o ! -r "$VARFILE" ]; then
		SUPPORT="oracle,$DEBUG_GUY"
		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
			VARFILE = ->${VARFILE}<- on `hostname`
			does not exist or is not readable.
			continuing with SUPPORT = oracle,$DEBUG_GUY
		XXX
		SUPPORT="oracle,$DEBUG_GUY"
	fi

	if [ "$SUPPORT" = "NOBODY" ]; then
		## SUPPORT="$DEBUG_GUY"
		SUPPORT=`awk -F= '/^MAIL:MAIL_TO/ {print $2}' $VARFILE | sed
's/[ 	]*//g'`
		if [ -z "$SUPPORT" ]; then
			SUPPORT="oracle,$DEBUG_GUY"
			mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
				No SUPPORT is defined on `hostname`
				Need MAIL:MAIL_TO line in ORACLE_VARIABLES
				Setting SUPPORT = oracle,$DEBUG_GUY
			XXX
		fi
	fi

	## A bunch of files and directories get defined here.
	## Their existence and readability will be checked later.
	PASSWORD_FILE='/oracle/app/oracle/.secure/.passwd'
	SCRIPTDIR=`echo $0 | sed 'sX'$SCRIPTNAME'$XX'`
	LINKDIR="${SCRIPTDIR}soft_links"
	CONFIG_FILE="${SCRIPTDIR}${MY_TAG}_config"
	MAIL_FILE="${SCRIPTDIR}mail_file"
	PAGE_FILE="${SCRIPTDIR}page_file"
	LOG_FILE="${SCRIPTDIR}log_file"
	NUISANCE_FILE="${SCRIPTDIR}nuisance_file"
	WHO_TO_PAGE="${SCRIPTDIR}who_to_page"
	TEMPFILE="${SCRIPTDIR}tempfile"
	DAILY_SUMMARY="${SCRIPTDIR}daily_summary"
	MYNAME=`hostname | awk -F. '{print $1}' | tr "[a-z]" "[A-Z]"`
	MYPID=$$
	MYPPID=`ps -eo pid -o ppid -o args | sed 's/^ *//g;s/  */ /g' | awk
'$1 == PID {print $2}' PID=$MYPID`

        ## The next three variables are used to determine when to perform maintenance

	## activities that are done once per day.
	## How frequently cron runs this script, in minutes.  If you change
the number
	## of times per hour that this script is run by cron, then change
this variable.
	CRON_INTERVAL=30
	## When to do daily maintenance. use 24 hr. format
	MAINT_TIME=800
	TIME="`date +%H%M | sed 's/^00*//g'`"
	## If TIME is empty after the sed statment, then it's because TIME
was all zeros.
	if [ -z "$TIME" ]; then TIME='0'; fi

	if [ "$1" = "DEBUG" ]; then
		echo "PASSWORD_FILE = $PASSWORD_FILE"
		echo "VARFILE = $VARFILE"
		echo "SCRIPTNAME = $SCRIPTNAME"
		echo "MY_TAG = $MY_TAG"
		echo "SCRIPTDIR = $SCRIPTDIR"
		echo "MYNAME = $MYNAME"
		echo "MYPID = $MYPID"
		echo "MYPPID = $MYPPID"
		echo "SUPPORT = $SUPPORT"
		echo "CONFIG_FILE = $CONFIG_FILE"
		echo "MAIL_FILE = $MAIL_FILE"
		echo "PAGE_FILE = $PAGE_FILE"
		echo "LOG_FILE = $LOG_FILE"
		echo "NUISANCE_FILE = $NUISANCE_FILE"
		echo "CRON_INTERVAL = $CRON_INTERVAL"
		echo "MAINT_TIME = $MAINT_TIME"
		echo "TIME = $TIME"
		echo "WHO_TO_PAGE = $WHO_TO_PAGE"
	fi

	rm -f "$TEMPFILE" 2> /dev/null
	echo "testing" > "$TEMPFILE"
	if [ $? -ne 0 ]; then
		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
			$SCRIPTNAME broke on $MYNAME
			Cannot write to script directory: ->${SCRIPTDIR}<-
			Exiting...
		XXX
		return 1
	fi
	rm -f "$TEMPFILE" 2> /dev/null

	if [ ! -f "$LOG_FILE" ]; then
		touch "$LOG_FILE" 2> /dev/null
	fi
	if [ ! -f "$LOG_FILE" ]; then
		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
			$SCRIPTNAME broke on $MYNAME
			Cannot create log_file ->${LOG_FILE}<-
			in directory ->${SCRIPTDIR}<-
			Exiting...
		XXX
		return 1
	fi
	tail -1000 "$LOG_FILE" > "$TEMPFILE"
	rm -f "$LOG_FILE"
	mv "$TEMPFILE" "$LOG_FILE"
	chmod 640 "$LOG_FILE"
	echo "=================== `date` =====================" >>
"$LOG_FILE"
	if [ ! -d "${LINKDIR}" ]; then
		mkdir -m 750 "${LINKDIR}" 2>> "$LOG_FILE"
	fi
	if [ ! -d "${LINKDIR}" ]; then
		echo "   Failed to create LINKDIR ->${LINKDIR}<- on $MYNAME"
>> "$LOG_FILE"
		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
			$SCRIPTNAME broke on $MYNAME
			Failed to create soft link directory on $MYNAME
			Exiting...
		XXX
		return 1
	fi

#### Not using CONFIG_FILE in JOB_MON script
##
##	if [ ! -f "$CONFIG_FILE" ]; then
##		echo "   config file ->${CONFIG_FILE}<- does not exist.
Exiting" >> "$LOG_FILE"
##		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
##			$SCRIPTNAME broke on $MYNAME
##			config file ->${CONFIG_FILE}<- does not exist.
##			Exiting...
##		XXX
##		return 1
##	fi
##	if [ ! -r "$CONFIG_FILE" ]; then
##		echo "   $CONFIG_FILE is not readable.  Exiting" >>
"$LOG_FILE"
##		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
##			$SCRIPTNAME broke on $MYNAME
##			config file ->${CONFIG_FILE}<- is not readable.
##			Exiting...
##		XXX
##		return 1
##	fi

##
	if [ ! -f "$WHO_TO_PAGE" ]; then
		echo "   $WHO_TO_PAGE does not exist.  Exiting" >>
"$LOG_FILE"
		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
			$SCRIPTNAME broke on $MYNAME
			who to page ->${WHO_TO_PAGE}<- does not exist.
			Exiting...
		XXX
		return 1
	fi
	if [ ! -r "$WHO_TO_PAGE" ]; then
		echo "   $WHO_TO_PAGE is not readable.  Exiting" >>
"$LOG_FILE"
		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
			$SCRIPTNAME broke on $MYNAME
			who to page ->${WHO_TO_PAGE}<- is not readable.
			Exiting...
		XXX
		return 1
	fi

	PAGER_PERSON=''
	## The following sed should have a space and a tab in the brackets
	cat "$WHO_TO_PAGE" | sed '/^[ 	]*#/d' | while read LINE; do
		PAGER_PERSON="${PAGER_PERSON},${LINE}"
	done
	## The following sed should have a space and tab in the brackets
	PAGER_PERSON=`echo "$PAGER_PERSON" | sed 's/[	]*//g; s/^,*//g;
s/,,*/,/g'`
	## We will wait until we have created the NUISANCE_FILE in
TEST_THE_SID
	## before checking to see if PAGER_PERSON is defined or empty.

	if [ ! -r "$PASSWORD_FILE" ]; then
		echo "   $PASSWORD_FILE is not readable.  Exiting" >>
"$LOG_FILE"
		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
			$SCRIPTNAME broke on $MYNAME
			password file ->${PASSWORD_FILE}<- is not readable.
			Exiting...
		XXX
		return 1
	fi

	## Each assigned value MUST be one "word"; no white space allowed
	STATUS_LIST[0]='OK'
	STATUS_LIST[1]='CONNECT_ERROR'
	STATUS_LIST[2]='JOB_BROKEN'
	STATUS_LIST[3]='OTHER_ERROR'

}

##################### BEGIN SET_VARIABLES #####################

##################### BEGIN INITIALIZE_FILES #####################
INITIALIZE_FILES () {         ## NOTE: The LOG_FILE is handled in SET_VARIABLES function
	## If we can't create a mail file, then we probably can't write to a
	## logfile, but we can try, so try 2>> LOG_FILE.
	if [ -f "$MAIL_FILE" ]; then
		chmod 640 "$MAIL_FILE" 2>> "$LOG_FILE"
		cat /dev/null > "$MAIL_FILE" 2>> "$LOG_FILE"
	else
		touch "$MAIL_FILE" 2>> "$LOG_FILE"
		chmod 640 "$MAIL_FILE" 2>> "$LOG_FILE"
	fi
	if [ ! -f "$MAIL_FILE" ]; then
		echo "   Failed to create mail file ->${MAIL_FILE}<-" >>
"$LOG_FILE"
		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
			$SCRIPTNAME broke.
			Failed to create mail file ->${MAIL_FILE}<- on
$MYNAME
			Exiting ...
		XXX
		return 1
	fi

	if [ -f "$PAGE_FILE" ]; then
		chmod 640 "$PAGE_FILE" 2>> "$LOG_FILE"
		cat /dev/null > "$PAGE_FILE" 2>> "$LOG_FILE"
	else
		touch "$PAGE_FILE" 2>> "$LOG_FILE"
		chmod 640 "$PAGE_FILE" 2>> "$LOG_FILE"
	fi
	if [ ! -f "$PAGE_FILE" ]; then
		echo "   Failed to create page file ->${PAGE_FILE}<-" >>
"$LOG_FILE"
		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
			$SCRIPTNAME broke.
			Failed to create page file ->${PAGE_FILE}<- on
$MYNAME
			Exiting ...
		XXX
		return 1
	fi

	if [ -f "$NUISANCE_FILE" ]; then
		chmod 640 "$NUISANCE_FILE" 2>> "$LOG_FILE"
	else
		touch "$NUISANCE_FILE" 2>> "$LOG_FILE"
		chmod 640 "$NUISANCE_FILE" 2>> "$LOG_FILE"
	fi
	if [ ! -f "$NUISANCE_FILE" ]; then
		echo "   Failed to create nuisance file
->${NUISANCE_FILE}<-" >> "$LOG_FILE"
		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
			$SCRIPTNAME broke.
			Failed to create nuisance file ->${NUISANCE_FILE}<-
on $MYNAME
			Exiting ...
		XXX
		return 1
	fi
	echo "=================== `date` =====================" >>
"$NUISANCE_FILE" }
##################### END INITIALIZE_FILES #####################

##################### BEGIN SET_ORACLE_VARIABLES #####################
SET_ORACLE_VARIABLES () {
	export TNS_ADMIN='/oracle/app/oracle/admin/scripts/network/admin'
	if [ ! -d "${TNS_ADMIN}" ]; then
		echo "   ${TNS_ADMIN} does not exist.  Exiting" >>
"$LOG_FILE"
		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
			$SCRIPTNAME broke on $MYNAME
			TNS_ADMIN ${TNS_ADMIN} does not exist.
			Exiting...
		XXX
		return 1
	fi

	TNSFILE="${TNS_ADMIN}/tnsnames.ora"
	if [ ! -r "${TNSFILE}" ]; then
		echo "   tnsnames.ora ${TNSFILE} is not readable.  Exiting"
>> "$LOG_FILE"
		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
			$SCRIPTNAME broke on $MYNAME
			tnsnames.ora ${TNSFILE} is not readable.
			Exiting...
		XXX
		return 1
	fi

	ORATAB='/etc/oratab'
	if [ ! -r "$ORATAB" ]; then
		echo "   $ORATAB is not readable.  Exiting" >> "$LOG_FILE"
		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
			$SCRIPTNAME broke on $MYNAME
			oratab ->${ORATAB}<- is not readable.
			Exiting...
		XXX
		return 1
	fi

	## Make ORACLE_HOME the latest revision of Oracle listed in the
oratab.
	## We are assuming that paths in oratab are of the form
/dir1/dir2/.../product/#.#.#...
	export ORACLE_HOME=`awk -F: 'NF==3 && $2 ~
/product\/[0-9]+\.[0-9]+\.[0-9]+/ {print $2}' $ORATAB | sort -r | sed -n 1p`
	if [ ! -d "${ORACLE_HOME}" ]; then
		echo "   ORACLE_HOME ->${ORACLE_HOME}<- does not exist.
Exiting" >> "$LOG_FILE"
		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
			$SCRIPTNAME broke on $MYNAME
			ORACLE_HOME ${ORACLE_HOME} for latest revision
listed in
			oratab ->${ORATAB}<- does not exist.
			Exiting...
		XXX
		return 1
	fi

	export

PATH=${ORACLE_HOME}/bin:${PATH:-/usr/bin:/usr/sbin:/opt/bin:/usr/ccs/bin:/us r/local/bin}

        export
LD_LIBRARY_PATH=${ORACLE_HOME}/lib:${LD_LIBRARY_PATH:-/usr/lib:/obackup/lib: /usr/ccs/lib}

        export SHLIB_PATH=${ORACLE_HOME}/lib:${SHLIB_PATH:-/obackup/lib}

}

##################### END SET_ORACLE_VARIABLES #####################

##################### BEGIN CHECK_IF_RUNNING #####################
CHECK_IF_RUNNING () {         ## The creation of the lock file is not necessary to check if another copy of the script

        ## is running. But most of the time another copy will NOT be running, so I think this

        ## will save the overhead of running ps and parsing its output. If the lock file is not

        ## successfully created, then we run ps and parse its output to confirm if another copy

	## of the script is running.
	## It also serves as an easy manual check, using ls -l, that the
script is running.
	ln -s WE_BE_RUNNING "${SCRIPTDIR}lock_file"
	if [ $? -ne 0 ]; then
		{
		ps -eo pid -o args | sed 's/^ *//g;s/  */ /g' | awk '
			$1 == PID {next}
			$1 == PPID {next}
			$2 == "vi" {next}
			$0 ~ "awk" {next}
			$0 ~ XXX {print "ALREADY RUNNING"}
		' PID=$MYPID PPID=$MYPPID XXX=$SCRIPTNAME
		} | while read LINE; do
			if [ "$1" = "DEBUG" ]; then
				echo "  CHECK_IF_RUNNING read line: $LINE"
			fi
			if [ "$LINE" = "ALREADY RUNNING" ]; then
				echo "`date '+%D %H:%M'`: Exiting. Already
running." >> "$LOG_FILE"
				echo "`date '+%D %H:%M'`: Exiting. Already
running."
				return 1
			fi
		done
	fi

}

##################### END CHECK_IF_RUNNING #####################

##################### BEGIN CHECK_OTHER_BOX #######################
CHECK_OTHER_BOX () {
	## If no secondary box is defined, then there is nothing to do.
	if [ -z "$SECONDARY_BOX" ]; then
		if [ "$1" = "DEBUG" ]; then
			echo "CHECK_OTHER_BOX: NO secondary box is defined.
Skipping CHECK_OTHER_BOX."
		fi
		return 0
	else
		if [ "$1" = "DEBUG" ]; then
			echo "CHECK_OTHER_BOX: SECONDARY_BOX is defined =
$SECONDARY_BOX"
		fi
	fi
	## From here on down, we may assume that SECONDARY_BOX has been
defined.
	## If secondary box is defined, then primary box must be too.
	if [ -z "$PRIMARY_BOX" ]; then
		mailx -s "FIX THIS" "$SUPPORT" <<-XXX
			$MY_TAG PROBLEM on `hostname`.
			The PRIMARY_BOX variable is not set.
			The SECONDARY_BOX variable IS set.
			Either UNset SECONDARY_BOX or define PRIMARY_BOX.
		XXX
		echo "   PRIMARY_BOX variable not set, but SECONDARY_BOX
is." >> "$LOG_FILE"
		if [ "$1" = "DEBUG" ]; then
			echo "CHECK_OTHER_BOX: PRIMARY_BOX variable not set,
but SECONDARY_BOX is."
		fi
		return 1
	else
		if [ "$1" = "DEBUG" ]; then
			echo "CHECK_OTHER_BOX: PRIMARY_BOX = $PRIMARY_BOX"
		fi
	fi

	## Check that the assignment of primary or secondary box matches
hostname.
	## Define MY_ROLE variable.
	case "`hostname`" in
		"$PRIMARY_BOX") MY_ROLE='PRIMARY';;
		"$SECONDARY_BOX") MY_ROLE='SECONDARY';;
		*) mailx -s "FIX THIS" "$SUPPORT" <<-XXX
				$MY_TAG PROBLEM
				The variables PRIMARY_BOX and SECONDARY_BOX
are defined,
				but do not match what is returned by
hostname command.
				PRIMARY_BOX = ->${PRIMARY_BOX}<-
				SECONDARY_BOX = ->${SECONDARY_BOX}<-
				hostname command = `hostname`
			XXX
			echo "   PRIMARY_BOX and SECONDARY_BOX variables
fail to match hostname command." >> "$LOG_FILE"
			if [ "$1" = "DEBUG" ]; then
				echo "CHECK_OTHER_BOX: PRIMARY_BOX and
SECONDARY_BOX variables fail to match hostname command."
			fi
			return 1
			;;
	esac
	if [ "$1" = "DEBUG" ]; then
		echo "CHECK_OTHER_BOX: MY_ROLE = $MY_ROLE"
	fi

	## Define OTHER_BOX variable
	case "$MY_ROLE" in
		"PRIMARY") OTHER_BOX="$SECONDARY_BOX";;
		"SECONDARY") OTHER_BOX="$PRIMARY_BOX";;
		*) OTHER_BOX='SCRIPT_IS_BROKE';;
	esac
	if [ "$1" = "DEBUG" ]; then
		echo "CHECK_OTHER_BOX: Defined OTHER_BOX = $OTHER_BOX"
	fi
	## This should not happen.
	if [ "$OTHER_BOX" = "SCRIPT_IS_BROKE" ]; then
		mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
			$SCRIPTNAME got OTHER_BOX = ->${OTHER_BOX}<-
			in CHECK_OTHER_BOX function.
			Got MY_ROLE = ->${MY_ROLE}<-
		XXX
		return 1
	fi

	## The soft link serves as a record of the last test status.
	if [ ! -L "${LINKDIR}/OTHER_BOX_STATUS" ]; then
		## If the soft link is not here, then create it.
		OLD_STATUS='NOT_TESTED'
		ln -s NOT_TESTED "${LINKDIR}/OTHER_BOX_STATUS"
		if [ $? -ne 0 ]; then
			mailx -s "FIX THIS" "$SUPPORT" <<-XXX
				$MY_TAG CHECK_OTHER_BOX: PROBLEM
				Failed to create soft link OTHER_BOX_STATUS
-> NOT_TESTED
				on ${MYNAME}:${LINKDIR}
			XXX
			echo "   Failed to create soft link OTHER_BOX_STATUS
-> NOT_TESTED." >> "$LOG_FILE"
			if [ "$1" = "DEBUG" ]; then
				echo "CHECK_OTHER_BOX: Failed to create soft
link OTHER_BOX_STATUS -> NOT_TESTED."
			fi
			case "$MY_ROLE" in
				## Don't error off if this is primary box.
				"PRIMARY") return 0;;
				## Error off if this is secondary box.
				"SECONDARY") return 1;;
				## This should not be able to happen ...
REALLY!
				*) mailx -s "FIX THIS" "$SUPPORT" <<-XXX
						$MY_TAG PROBLEM
						Got value ->${MY_ROLE}<- for
MY_ROLE on $MYNAME
						in function CHECK_OTHER_BOX.
						This must be either PRIMARY
or SECONDARY.
						Must be a bug in the script;
this should not be possible.
					XXX
					## Log whatever wierd value we got
for MY_ROLE.
					echo "   CHECK_OTHER_BOX BROKE: Got
value ->${MY_ROLE}<- for MY_ROLE." >> "$LOG_FILE"
					return 1;;
			esac
		else
			if [ "$1" = "DEBUG" ]; then
				echo "CHECK_OTHER_BOX: Created
${LINKDIR}/OTHER_BOX_STATUS -> NOT_TESTED"
			fi
		fi
	fi

	## Read soft link for status of last test.
	if [ "$OLD_STATUS" != "NOT_TESTED" ]; then
		OLD_STATUS=`ls -l "${LINKDIR}" 2> /dev/null | sed -n '/
OTHER_BOX_STATUS -> /s/\([^>]*> \)\(.*\)/\2/p'`
	fi
	if [ "$1" = "DEBUG" ]; then
		echo "CHECK_OTHER_BOX: OLD_STATUS = $OLD_STATUS for
OTHER_BOX_STATUS"
	fi
	## The following is an example of paranoia.
	## In case soft link is not OK, BROKE, or NOT_TESTED.
	case "$OLD_STATUS" in
		"OK") ;;
		"BROKE");;
		"NOT_TESTED");;
		*)
			## This should not happen, but just in case ...
			rm -f "${LINKDIR}/OTHER_BOX_STATUS"
			ln -s NOT_TESTED "${LINKDIR}/OTHER_BOX_STATUS"
			OLD_STATUS=`ls -l "${LINKDIR}" 2> /dev/null | sed -n
'/ OTHER_BOX_STATUS -> /s/\([^>]*> \)\(.*\)/\2/p'`
			if [ "$OLD_STATUS" != "NOT_TESTED" ]; then
				mailx -s "FIX THIS" "$SUPPORT" <<-XXX
					SERIOUS $MY_TAG PROBLEM on $MYNAME
					Cannot create OTHER_BOX_STATUS ->
NOT_TESTED on ${MYNAME}:${LINKDIR}
				XXX
			fi
			if [ "$MY_ROLE" = "PRIMARY" ]; then
				## Do not error off if this is primary box.
				return 0
			else
				return 1
			fi
		;;
	esac

	## Ping other box 1 time
	if [ "$1" = "DEBUG" ]; then
		ping -c 1 "$OTHER_BOX"
	else
		ping -c 1 "$OTHER_BOX" > /dev/null 2>&1
	fi

	## If ping not successful
	if [ $? -ne 0 ]; then
		if [ "$1" = "DEBUG" ]; then
			echo "CHECK_OTHER_BOX: Ping OTHER_BOX = $OTHER_BOX
FAILED"
			echo "                 OLD_STATUS = $OLD_STATUS"
		fi
		## If status has changed, then recreate soft link and send
email.
		if [ "$OLD_STATUS" != "BROKE" ]; then
			rm "${LINKDIR}/OTHER_BOX_STATUS" 2>> "$LOG_FILE"
			ln -s BROKE "${LINKDIR}/OTHER_BOX_STATUS" 2>>
"$LOG_FILE"
			if [ $? -eq 0 ]; then
				echo "   $OTHER_BOX changed to BROKE from
$OLD_STATUS" >> "$LOG_FILE"
				if [ "$1" = "DEBUG" ]; then
					echo "$OTHER_BOX changed to BROKE
from $OLD_STATUS"
				fi
				## This info is mailed now instead of going
to mail_file.
				if [ -n "$PAGER_PERSON" ]; then
					if [ "$1" = "DEBUG" ]; then
						echo "   Sending page to
$PAGER_PERSON"
					fi
					## Even though this script is not
conn_check, send message as if it were.
					## That way the same paging setup
can be used for both.
					mailx -s "CONN_CHECK INFO"
$PAGER_PERSON <<-XXX
						$OTHER_BOX changed to BROKE
from $OLD_STATUS
					XXX
				else
					## If, for some reason, PAGER_PERSON
is blank, email SUPPORT
					mailx -s "CONN_CHECK INFO" $SUPPORT
<<-XXX
						$OTHER_BOX changed to BROKE
from $OLD_STATUS
					XXX
				fi
			else
				## In case we can't modify the soft link
				## AND you're going to keep getting this
email until you fix it.
				mailx -s "FIX THIS" "$SUPPORT" <<-XXX
					$MY_TAG PROBLEM
					Failed to modify soft link
OTHER_BOX_STATUS -> BROKE
					on `hostname`:${LINKDIR}
					Got OTHER_BOX = ->${OTHER_BOX}<-
				XXX
				echo "   CHECK_OTHER_BOX: FAILED to modify
soft link OTHER_BOX_STATUS -> BROKE" >> "$LOG_FILE"
				if [ "$1" = "DEBUG" ]; then
					echo "CHECK_OTHER_BOX: FAILED to
modify soft link OTHER_BOX_STATUS -> BROKE"
				fi
			fi
		fi
	## If ping is successful
	else
		if [ "$1" = "DEBUG" ]; then
			echo "CHECK_OTHER_BOX: Ping OTHER_BOX = $OTHER_BOX
OK OK"
			echo "                 OLD_STATUS = $OLD_STATUS"
		fi
		if [ "$OLD_STATUS" != "OK" ]; then
			## If status has changed, then recreate soft link
and send email.
			rm "${LINKDIR}/OTHER_BOX_STATUS"
			ln -s OK "${LINKDIR}/OTHER_BOX_STATUS"
			if [ $? -eq 0 ]; then
				## This info is mailed now instead of going
to mail_file.
				if [ -n "$PAGER_PERSON" ]; then
					if [ "$1" = "DEBUG" ]; then
						echo "   Sending page to
$PAGER_PERSON"
					fi
					## Even though this script is not
conn_check, send message as if it were.
					## That way the same paging setup
can be used for both.
					mailx -s "CONN_CHECK INFO"
$PAGER_PERSON <<-XXX
						$OTHER_BOX changed to OK
from $OLD_STATUS
					XXX
				else
					## If, for some reason, PAGER_PERSON
is blank, email SUPPORT
					mailx -s "CONN_CHECK INFO" $SUPPORT
<<-XXX
						$OTHER_BOX changed to OK
from $OLD_STATUS
					XXX
				fi
				echo "   OTHER_BOX = $OTHER_BOX changed to
OK from $OLD_STATUS" >> "$LOG_FILE"
				if [ "$1" = "DEBUG" ]; then
					echo "CHECK_OTHER_BOX: OTHER_BOX
changed to OK from $OLD_STATUS"
					echo "    Created soft link
${LINKDIR}/OTHER_BOX_STATUS -> OK"
				fi
			else
				mailx -s "FIX THIS" "$SUPPORT" <<-XXX
					CONN_CHECK PROBLEM
					Failed to modify soft link
OTHER_BOX_STATUS -> OK
					on `hostname`:${LINKDIR}
					Got OTHER_BOX = ->${OTHER_BOX}<-
				XXX
				echo "   CHECK_OTHER_BOX: FAILED to modify
soft link OTHER_BOX_STATUS -> OK" >> "LOG_FILE"
				if [ "$1" = "DEBUG" ]; then
					echo "CHECK_OTHER_BOX: FAILED to
modify soft link OTHER_BOX_STATUS -> OK"
				fi
			fi
		fi
		if [ "$MY_ROLE" = "SECONDARY" ]; then
			## If this is secondary box, then the script should
stop here.
			## Return non-zero to make script exit.
			if [ "$1" = "DEBUG" ]; then
				echo "**** I am SECONDARY.  My job is done.
****"
			fi
			return 2
		fi
	fi

	return 0

}

##################### END CHECK_OTHER_BOX #######################

#################### BEGIN RUN_TEST ###################
RUN_TEST () {
	## remove any TEST_SID links left by previous run.
	for FILE in `ls -1 "${LINKDIR}/TEST_"* 2> /dev/null`; do
		rm "$FILE" 2>> "$LOG_FILE"
		if [ $? -eq 0 ]; then
			echo "RUN_TEST: Removed $FILE" >> "$LOG_FILE"
			if [ "$1" = "DEBUG" ]; then
				echo "RUN_TEST: Removed $FILE"
			fi
		else
			echo "- RUN_TEST: ERROR removing $FILE" >>
"$LOG_FILE"
			if [ "$1" = "DEBUG" ]; then
				echo "RUN_TEST: ERROR removing TEST_SID link
$FILE"
			fi
		fi
	done

	## Get list of sids that have system password in password file AND
are listed in tnsnames.ora.
	## If SID is not in password file, then it will not be tested.
	/usr/bin/awk -F: '
		$1 ~ /#/ {next}
		/^$/ {next}
		NF==3 && $2 ~ /[sS][yY][sS][tT][eE][mM]/ {print $1,$2,$3}'
"$PASSWORD_FILE" | \
	while read SID USER PASS; do
		if [ "$1" = "DEBUG" ]; then
			echo "Found $SID in $PASSWORD_FILE; egrepping
$TNSFILE"
		fi
		egrep \^$SID'[. ]' "$TNSFILE" > /dev/null
		if [ $? -ne 0 ]; then
			echo "   $SID in passwd but not in tnsnames.ora on
$MYNAME" >> "$NUISANCE_FILE"
			echo "   $SID in passwd but not in tnsnames.ora on
$MYNAME" >> "$LOG_FILE"
			if [ "$1" = "DEBUG" ]; then
				echo "   $SID in passwd but not in
tnsnames.ora on $MYNAME"
			fi
			continue
		fi
		## Put link to serve as a marker that we tried to test this
SID.
		## This is in case sqlplus hangs.  Then we know what SIDS
hung.
		ln -s NOTHING "${LINKDIR}/TEST_${SID}" 2>> "$LOG_FILE"
		if [ $? -ne 0 ]; then
			echo "$SCRIPTNAME broke running ln -s NOTHING
${LINKDIR}/TEST_${SID}" >> "$PAGE_FILE"
			echo "$SCRIPTNAME broke running ln -s NOTHING
${LINKDIR}/TEST_${SID}" >> "$LOG_FILE"
			if [ "$1" = "DEBUG" ]; then
				echo "- RUN_TEST: BROKE running ln -s
NOTHING ${LINKDIR}/TEST_${SID}"
			fi
			## If we got error, then the link should not exist,
but ....
			rm "${LINKDIR}/TEST_${SID}" 2> /dev/null
			continue
		fi
		## Run sqlplus test in a subshell and background it in case
it hangs.
		## The most common reason for hanging is target computer
cannot be reached
		## (computer down or network problem), but could be database
is locked up.
		( STATUS=0
			{
			sqlplus -s <<-XXX
				system/${PASS}@${SID}
				set serveroutput on
				set heading off
				set feedback off
				-- the following is cartesian join, but OK
because only one line returned for NAME.
				-- The goofy outer join is to force at least
one row to be returned when there are
				-- no jobs.  The script MUST read at least
one line to determine if sqlplus worked.
				select a.name, nvl(b.broken,'N') from
v\$database a, dba_jobs b where b.job(+) != a.checkpoint_change#;
			XXX
			} | sed '/^$/d; s/^[ 	]*//g' | \
			while read LINE; do
				if [ "$1" = "DEBUG" ]; then
					echo "SQLPLUS $SID:==> $LINE"
				fi
				## If sqlplus didn't hang, then we got here;
so remove the TEST_SID link.
				## If this link is not removed, then later
on we will assume sqlplus hung.
				## Redirect error to /dev/null in case
multiple lines are read causing script
				## to try to remove the link more than once.
				rm "${LINKDIR}/TEST_${SID}" 2> /dev/null

				## If STATUS is not zero, then loop and do
nothing to empty buffer.
				## A text comparison is used in case a bug
in the script leaves STATUS blank.
				if [ "$STATUS" != "0" ]; then
					continue
				fi

				## values printed by the awk statement match
the indexes in the STATUS_LIST
				## array defined in the SET_VARIABLES
function.
				## 0 = OK
				## 1 = CONNECT_ERROR
				## 2 = JOB_BROKEN
				## 3 = OTHER_ERROR
				## We only check the first three characters
of the NAME returned by sqlplus.
				## This permits limited use of database
aliases.  For example RATE_SEND & RATE_GET
				## This assumes it is unlikely the first
three characters of an error message
				## will match the first three characters of
a SID name.
				STATUS=`echo $LINE | awk '
					substr($0,1,3) != substr(SID,1,3)
{print "1"; exit}
					$2 == "Y" {print "2"; exit}
					$2 == "N" {print "0"; exit}
					{print "3"}
				' SID="$SID"`
				if [ "$1" = "DEBUG" ]; then
					echo "RUN_TEST $SID: STATUS =
$STATUS = ${STATUS_LIST[$STATUS]}"
				fi
			done

			## Test to see if link status already exists.  If
so, then leave it alone.
			## Otherwise, replace it with new status and, if
appropriate, send notification of change.
			OLD_STATUS=`ls -l "${LINKDIR}" 2> /dev/null | sed -n
'/ '$SID' -> /s/\([^>]*> \)\(.*\)/\2/p'`
			if [ -z "OLD_STATUS" ]; then OLD_STATUS='BLANK'; fi
			if [ "$1" = "DEBUG" ]; then
				echo "RUN_TEST $SID: OLD_STATUS =
$OLD_STATUS ; tested status = ${STATUS_LIST[$STATUS]}"
			fi
			if [ "$OLD_STATUS" = "${STATUS_LIST[$STATUS]}" ];
then
				if [ "$1" = "DEBUG" ]; then
					echo "RUN_TEST $SID:STATUS =
${STATUS_LIST[$STATUS]} already exists."
				fi
				continue
			else
				if [ "$1" = "DEBUG" ]; then
					echo "RUN_TEST $SID: STATUS =
${STATUS_LIST[$STATUS]} does NOT exist."
					echo "         Remove
${LINKDIR}/${SID}"
				fi
				rm "${LINKDIR}/${SID}" 2>> "$LOG_FILE"
				ln -s "${STATUS_LIST[$STATUS]}"
"${LINKDIR}/${SID}" 2>> "$LOG_FILE"
				if [ $? -eq 0 ]; then
					## Send page only for STATUS = 2
errors.  Let other scripts page for other errors.
					if [ "$STATUS" = "2" -o
"$OLD_STATUS" = "${STATUS_LIST[2]}" ]; then
						echo "$SID changed to
${STATUS_LIST[$STATUS]} from $OLD_STATUS" >> "$PAGE_FILE"
					else
						echo "   $SID changed to
${STATUS_LIST[$STATUS]} from $OLD_STATUS" >> "$NUISANCE_FILE"
						echo "$SID changed to
${STATUS_LIST[$STATUS]} from $OLD_STATUS" >> "$MAIL_FILE"
					fi
					## Log all errors even if not
sending page.
					echo "$SID changed to
${STATUS_LIST[$STATUS]} from $OLD_STATUS" >> "$LOG_FILE"
					if [ "$1" = "DEBUG" ]; then
						echo "RUN_TEST $SID: changed
to ${STATUS_LIST[$STATUS]} from $OLD_STATUS"
					fi
				else
					echo "$MY_TAG broke running ln -s
${STATUS_LIST[$STATUS]} ${LINKDIR}/${SID}" >> "$PAGE_FILE"
					echo "   BROKE running ln -s
${STATUS_LIST[$STATUS]} ${LINKDIR}/${SID}" >> "$LOG_FILE"
					if [ "$1" = "DEBUG" ]; then
						echo "RUN_TEST BROKE running
ln -s ${STATUS_LIST[$STATUS]} ${LINKDIR}/${SID}"
					fi
				fi
			fi
		)&
		## This "done" goes with while reading from password and
tsnsnames.ora files
	done

	## Give background jobs some time to run.
	sleep 20
	## Empty out "jobs completed" messages.
	jobs > /dev/null 2>&1
	## See if there are still any jobs running.  If so, give them some
more time.
	COUNT=0
	MAXCOUNT=6
	while [ -n "`jobs`" -a $COUNT -lt $MAXCOUNT ]; do
		COUNT=$(( $COUNT + 1 ))
		sleep 20
	done
	if [ "$1" = "DEBUG" ]; then
		echo "--- RUN_TEST job sleep COUNT = $COUNT ---"
	fi
	## Kill any jobs still running.
	if [ $COUNT -ge $MAXCOUNT ]; then
		jobs > /dev/null 2>&1
		for JOB in `jobs | sed -n
'/\[[0-9][0-9]*\]/s/\(\[\)\([0-9][0-9]*\)\(.*\)/\2/p'`;do
			kill %$JOB
			echo "   Killed job $JOB" >> "$LOG_FILE"
			if [ "$1" = "DEBUG" ]; then
				echo "RUN_TEST Killed job $JOB"
			fi
		done
	fi

	for SID in `ls -1 "${LINKDIR}" 2> /dev/null | sed -n
'/TEST_/s/.*TEST_//gp'`; do
		## If TEST_SID soft link still exists, then sqlplus probably
hung for SID.
		## Assign status of CONNECT_ERROR which is STATUS=1
		STATUS=1
		## Test to see if link status already exists.  If so, then
leave it alone.
		## Otherwise, replace it with new status and, if
appropriate, send notification of change.
		OLD_STATUS=`ls -l "${LINKDIR}" 2> /dev/null | sed -n '/
'$SID' -> /s/\([^>]*> \)\(.*\)/\2/p'`
		if [ -z "OLD_STATUS" ]; then OLD_STATUS='BLANK'; fi
		if [ "$1" = "DEBUG" ]; then
			echo "RUN_TEST $SID: OLD_STATUS = $OLD_STATUS ;
tested status = ${STATUS_LIST[$STATUS]}"
		fi
		if [ "$OLD_STATUS" = "${STATUS_LIST[$STATUS]}" ]; then
			continue
		else
			rm "${LINKDIR}/${SID}" 2>> "$LOG_FILE"
			ln -s ${STATUS_LIST[$STATUS]} ${LINKDIR}/${SID} 2>>
"$LOG_FILE"
			if [ $? -eq 0 ]; then
				echo "$SID changed to
${STATUS_LIST[$STATUS]} from $OLD_STATUS" >> "$LOG_FILE"
				echo "   $SID changed to
${STATUS_LIST[$STATUS]} from $OLD_STATUS" >> "$NUISANCE_FILE"
				if [ "$1" = "DEBUG" ]; then
					echo "RUN_TEST $SID changed to
${STATUS_LIST[$STATUS]} from $OLD_STATUS"
				fi
			else
				echo "$MY_TAG broke running ln -s
${STATUS_LIST[$STATUS]} ${LINKDIR}/${SID}" >> "$PAGE_FILE"
				echo "   BROKE running ln -s
${STATUS_LIST[$STATUS]} ${LINKDIR}/${SID}" >> "$LOG_FILE"
				if [ "$1" = "DEBUG" ]; then
					echo "RUN_TEST BROKE running ln -s
${STATUS_LIST[$STATUS]} ${LINKDIR}/${SID}"
				fi
			fi
		fi
	done

}

#################### END RUN_TEST ###################

###################### BEGIN REMOVE_OLD_LINKS #####################
REMOVE_OLD_LINKS () {
	CURRENT_HOST='NOTHING'
	ls -l "$LINKDIR" | awk '/->/ {print $(NF - 2)}' |\
	while read LINK; do
		if [ "$1" = "DEBUG" ]; then
			echo "Check if current: $LINK"
		fi
		## The following sed statements assume no hostnames have an
underscore as part of the name.
		SID=`echo $LINK | sed 's/_.*$//g'`
		## Ignore lock_file
		if [ "$SID" = "lock" ]; then continue; fi
		## Ignore OTHER_BOX_STATUS
		if [ "$SID" = "OTHER" ]; then continue; fi
		if [ "$1" = "DEBUG" ]; then
			echo "-- Test if $SID in $PASSWORD_FILE"
		fi
		egrep -q \^${SID}: "$PASSWORD_FILE" 2>> "$LOG_FILE"
		if [ $? -eq 0 ]; then
			if [ "$1" = "DEBUG" ]; then
				echo "   FOUND $SID in password file"
			fi
		else
			rm "${LINKDIR}/${LINK}" 2>> "$LOG_FILE"
			if [ $? -eq 0 ]; then
				echo "- Removed old link ${LINKDIR}/${LINK}"
>> "$LOG_FILE"
				echo "- Removed old link ${LINKDIR}/${LINK}"
>> "$NUISANCE_FILE"
			else
				echo "- BROKE Removing old link
${LINKDIR}/${LINK}" >> "$LOG_FILE"
				echo "- BROKE Removing old link
${LINKDIR}/${LINK}" >> "$NUISANCE_FILE"
			fi
			continue
		fi

		if [ "$1" = "DEBUG" ]; then
			echo "-- Test if $SID in $TNSFILE"
		fi
		egrep -q \^${SID}'[.].*' "$TNSFILE" 2>> "$LOG_FILE"
		if [ $? -eq 0 ]; then
			if [ "$1" = "DEBUG" ]; then
				echo "   FOUND $SID in tnsnames file"
			fi
		else
			rm "${LINKDIR}/${LINK}" 2>> "$LOG_FILE"
			if [ $? -eq 0 ]; then
				echo "- Removed old link ${LINKDIR}/${LINK}"
>> "$LOG_FILE"
				echo "- Removed old link ${LINKDIR}/${LINK}"
>> "$NUISANCE_FILE"
			else
				echo "- BROKE Removing old link
${LINKDIR}/${LINK}" >> "$LOG_FILE"
				echo "- BROKE Removing old link
${LINKDIR}/${LINK}" >> "$NUISANCE_FILE"
			fi
		fi
	done

}
###################### END REMOVE_OLD_LINKS #####################

###################### BEGIN MAIL_AND_PAGE #####################
MAIL_AND_PAGE () {
	if [ -f "$MAIL_FILE" ]; then
		## The sed brackets have a space and a tab.
		COUNT=`sed 's/[ 	]*//g; /^$/d' "$MAIL_FILE" | awk
'END {print NR}'`
		if [ "$1" = "DEBUG" ]; then
			echo "   $MAIL_FILE has $COUNT lines."
		fi
		if [ $COUNT -gt 0 ]; then
			if [ "$1" = "DEBUG" ]; then
				echo "   Sending email to $SUPPORT"
			fi
			mailx -s "JOB_MON INFO" $SUPPORT < "$MAIL_FILE"
		fi
	fi

	if [ -f "$PAGE_FILE" ]; then
		## The sed brackets have a space and a tab.
		COUNT=`sed 's/[ 	]*//g; /^$/d' "$PAGE_FILE" | awk
'END {print NR}'`
		if [ "$1" = "DEBUG" ]; then
			echo "   $PAGE_FILE has $COUNT lines."
		fi
		if [ $COUNT -gt 0 ]; then
			if [ -n "$PAGER_PERSON" ]; then
				if [ "$1" = "DEBUG" ]; then
					echo "   Sending page to
$PAGER_PERSON"
				fi
				## Even though this script is not
conn_check, send message as if it were.
				## That way the same paging setup can be
used for both.
				mailx -s "CONN_CHECK INFO" $PAGER_PERSON <
"$PAGE_FILE"
			else
				mailx -s "CONN_CHECK INFO" $SUPPORT <
"$PAGE_FILE"
			fi
		fi
	fi

	if [ $TIME -ge $MAINT_TIME -a $TIME -lt $(( $MAINT_TIME +
$CRON_INTERVAL )) ]; then
		if [ -f "$DAILY_SUMMARY" ]; then
			chmod 640 "$DAILY_SUMMARY"
			cat /dev/null > "$DAILY_SUMMARY"
		else
			touch "$DAILY_SUMMARY"
			chmod 640 "$DAILY_SUMMARY"
		fi
		if [ ! -f "$DAILY_SUMMARY" ]; then
			echo "   Failed to create daily summary file
->${DAILY_SUMMARY}<-" >> "$LOG_FILE"
			mailx -s "JOB_MON BROKE" $SUPPORT <<-XXX
				$MY_TAG broke.
				Failed to create daily summary file
->${DAILY_SUMMARY}<- on $MYNAME
			XXX
		else
			ls -l "$LINKDIR" | awk '/ -> / {print
$(NF-2),$(NF-1),$NF}' | while read SID ARROW STATUS JUNK; do
				if [ "$SID" = "lock_file" ]; then continue;
fi
				if [ "$STATUS" != "OK" ]; then
					echo "$SID : $STATUS" >>
"$DAILY_SUMMARY"
					echo "   daily summary $SID :
$STATUS" >> "$LOG_FILE"
				fi
			done
			## The sed brackets have a space and a tab.
			COUNT=`sed 's/[ 	]*//g; /^$/d'
"$DAILY_SUMMARY" | awk 'END {print NR}'`
			if [ "$1" = "DEBUG" ]; then
				echo "   $DAILY_SUMMARY has $COUNT lines."
			fi
			if [ $COUNT -gt 0 ]; then
				if [ "$1" = "DEBUG" ]; then
					echo "   Sending daily summary to
$SUPPORT"
				fi
				mailx -s "JOB_MON DAILY SUMMARY" $SUPPORT <
"$DAILY_SUMMARY"
			else
				mailx -s "JOB_MON DAILY SUMMARY" $SUPPORT
<<-XXX
					No broken jobs found.
				XXX
			fi
		fi

		if [ -f "$NUISANCE_FILE" ]; then
			## The sed brackets have a space and a tab.
			COUNT=`sed 's/[ 	]*//g; /^$/d'
"$NUISANCE_FILE" | awk 'END {print NR}'`
			if [ "$1" = "DEBUG" ]; then
				echo "   $NUISANCE_FILE has $COUNT lines."
			fi
			if [ $COUNT -gt 0 ]; then
				if [ "$1" = "DEBUG" ]; then
					echo "   Sending nuisance_file to
$SUPPORT"
				fi
				mailx -s "JOB_MON NUISANCE FILE" $SUPPORT <
"$NUISANCE_FILE"
				cat /dev/null > "$NUISANCE_FILE"
			fi
		fi
	fi

}

###################### END MAIL_AND_PAGE #####################

###################### BEGIN RDIST_FILES #####################
RDIST_FILES () {
	if [ "$1" = "DEBUG" ]; then
		echo "RDIST_FILES: Got MY_ROLE = $MY_ROLE"
	fi

	case "$MY_ROLE" in
		"PRIMARY")
			if [ -n "$OTHER_BOX" ]; then
				## Since we have been writing stuff to
SCRIPTDIR, we will assume we can cd to it.
				## The idea here is to get the complete path
of the files regardless of where we
				## were sitting when we ran the scripts.
				cd "$SCRIPTDIR"
				CURDIR=`pwd`
				if [ "$1" = "DEBUG" ]; then
					echo "RDIST_FILES got CURDIR =
$CURDIR"
				fi
				## Update all files and soft links on
secondary box.
				## NOTE: This copies the lock_file as well.
I debated whether to clean this up
				## on the remote box, and decided not to
worry about it.  This should not be a
				## problem since the script already properly
(we hope) deals with the case where
				## the lock_file already exists.
				if [ "$1" = "DEBUG" ]; then
					rdist -wc "${CURDIR}" "${OTHER_BOX}"
				else
					rdist -wc "${CURDIR}" "${OTHER_BOX}"
2>> "$LOG_FILE" 1> /dev/null
				fi
				if [ $? -ne 0 ]; then
					echo "CHECK_OTHER_BOX: Failed to
rdist files to $OTHER_BOX" >> "$LOG_FILE"
					echo "   **> Failed to rdist files
to $OTHER_BOX" >> "$NUISANCE_FILE"
					if [ "$1" = "DEBUG" ]; then
						echo "CHECK_OTHER_BOX:
Failed to rdist files to $OTHER_BOX"
					fi
				else
					if [ "$1" = "DEBUG" ]; then
						echo "CHECK_OTHER_BOX: rdist
to $OTHER_BOX OK OK"
					fi
				fi
			fi
			;;
		"SECONDARY")
			## If this is secondary box, then primary must be
down.
			## So can't rdist any files to it.  So don't try.
				return 0
			;;
		*)
			mailx -s "$MY_TAG BROKE" $SUPPORT <<-XXX
				RDIST_FILES: Got MY_ROLE = ->${MY_ROLE}<-
			XXX
			echo "   RDIST_FILES: BROKE Got MY_ROLE =
->${MY_ROLE}<-" >> "$LOG_FILE"
			if [ "$1" = "DEBUG" ]; then
				echo "   RDIST_FILES: BROKE BROKE BROKE
BROKE"
			fi
	esac

}

###################### END RDIST_FILES #####################

###################### BEGIN MAIN #####################
SET_VARIABLES "$1"
if [ $? -ne 0 ]; then
	if [ -n "$SCRIPTDIR" ]; then
		rm -f "${SCRIPTDIR}lock_file"
		exit 1
	fi

fi

INITIALIZE_FILES "$1"
if [ $? -ne 0 ]; then

	if [ -n "$SCRIPTDIR" ]; then
		rm -f "${SCRIPTDIR}lock_file"
		exit 1
	fi

fi

SET_ORACLE_VARIABLES "$1"
if [ $? -ne 0 ]; then

	if [ -n "$SCRIPTDIR" ]; then
		rm -f "${SCRIPTDIR}lock_file"
		exit 1
	fi

fi

trap 'rm -f '${SCRIPTDIR}'lock_file; exit 0' 2 3 9 15

CHECK_OTHER_BOX "$1"
if [ $? -ne 0 ]; then

	if [ -n "$SCRIPTDIR" ]; then
		rm -f "${SCRIPTDIR}lock_file"
		exit 1
	fi

fi

CHECK_IF_RUNNING "$1"
if [ $? -ne 0 ]; then

	rm -f "${SCRIPTDIR}lock_file"
	exit 1

fi

RUN_TEST "$1"
if [ $? -ne 0 ]; then

	rm -f "${SCRIPTDIR}lock_file"
	exit 1

fi

MAIL_AND_PAGE "$1" REMOVE_OLD_LINKS "$1" RDIST_FILES "$1" rm -f "${SCRIPTDIR}lock_file"

###################### END MAIN #####################



-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Stephen Lee
  INET: Stephen.Lee_at_DTAG.Com

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

Received on Mon Apr 28 2003 - 12:26:53 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US