otrcrep memory fault

From: Simon McClenahan <simon_at_balr.com>
Date: Wed, 9 Jun 1999 16:00:10 -0500
Message-ID: <7jmkm3$c8a$1_at_eve.enteract.com>



Now that I have enabled Oracle tracing, I am trying to use the otrcrep command to get trace report information on a particular process id. Unfortunately, any attempt at running it either on SCO 5.0.5 or Solaris ends up with a core dump.

I'm starting to think that there is some sort of incompatibility or file corruption in the oracle7.cdf file that triggers a problem with the otrcrep program. Any clues on going about solving this problem?

Using the scotruss utility for SCO (where the database has tracing turned on)

Here are the interesting bits:

open("/usr/local/oracle/product/7.3/ocommon/nls/admin/data/lx1boot.nlb", 0x0, 0x0) = 3
read(3, "\A5 \A5 \00 \00 \00 \00 \10 \01 \C6 =\00 \00 |\BC ...", 44) = 44 brk(0x45b3cc) = 0
read(3, "\00 \00 \00 \00 \00 \00 \00 \00 \00 \00 \00 \00 \0...", 15770) = 15770 close(3) = 0
open("/usr/local/oracle/product/7.3/ocommon/nls/admin/data/lx00001.nlb", 0x0, 0x0) = 3
read(3, "ZZ\00 \00 \00 \00 \10 \01 \00 \00 \00 \00 \E2 \01 ...", 484) = 482 close(3) = 0
brk(0x45c3cc) = 0
open("/usr/local/oracle/product/7.3/ocommon/nls/admin/data/lx20001.nlb", 0x0, 0x0) = 3
read(3, "ZZ\00 \00 \00 \00 \10 \01 \02 \00 \00 \00 Y\0C \09...", 3164) = 3161 close(3) = 0
open("/usr/local/oracle/product/7.3/ocommon/nls/admin/data/lx10001.nlb", 0x0, 0x0) = 3
read(3, "ZZ\00 \00 \00 \00 \10 \01 \01 \00 \00 \00 \0D \01 ...", 272) = 269 close(3) = 0

open("/OTRACE/epcus.msb", 0x0, 0x40e978) = ENOENT
open("/OTRACE/epcus.msb", 0x0, 0x40e978) = ENOENT
open("/usr/local/oracle/product/7.3/otrace/mesg/epcus.msb", 0x0, 0x40e978) = 3
fcntl(3, 2, 1) = 0
lseek(3, 0x0, 0) = 0
read(3, "\15 \13 "\01 \13 \03 \09 \09 \00 \00 \00 \00 \00 \...", 240) = 240
open("/usr/local/oracle/log/oracle7.cdf", 0x2, 0x3) = 4
read(4, "", 4) = 4

...
a whole lot of reads from file descriptor 4 (oracle7.cdf) ...

read(4, "\00 \00 Z\B9 7^\9C K\00 \08 =`", 12) = 12 read(4, "NULL, 4) = 0
lseek(4, 0x472e, 0) = 18222
read(4, "\00 \00 \00 \07 ", 4) = 4
read(4, "\00 \00 PU7^\8A \B3 \00 \n\87 P\0B t\FC \B3 \00 \0...", 295) = 295 Process 23302 got signal SIGSEGV (Segmentation fault) at eip=0xa41b Process 23302 terminated via signal SIGSEGV (Segmentation fault). Child process has exited

Examining the core dump with SCO's dbXtra

$ dbxtra /usr/local/oracle/product/7.3/bin/otrcrep core

 browse [C]                                  -
        Reading symbolic information ...
        [using memory image in core]
        Type 'help' for help.

(dbxtra) where
epclbpline(0x7fffef40, 0x5a70, 0x0) at 0xa41b epclbpgrp(0x456a5c, 0x0, 0x1) at 0xa3a8
epcrp_facrg_rec(0x46aa3c) at 0x6484
epcrp_cdf(0x45c0f8, 0x0) at 0x24d7
epc__report_main(0x7ffff42f, 0x0, 0x3f, 0x50, 0x1, 0x12bf, 0x0) at 0x63a main(0x3, 0x7ffff26c, 0x7ffff27c) at 0x439 (dbxtra)

A similar situation using the Solaris version after copying the oracle7.cdf file over:

Solaris truss output shows:

read(4, 0xEFFFE824, 4) = 0
lseek(4, 18222, SEEK_SET) = 18222
read(4, "\0\0\007", 4) = 4
read(4, "\0\0 P U 7 ^8AB3\0\n87 P".., 295) = 295

    Incurred fault #6, FLTBOUNDS %pc = 0x00023E38       siginfo: SIGSEGV SEGV_MAPERR addr=0x00000002     Received signal #11, SIGSEGV [default]       siginfo: SIGSEGV SEGV_MAPERR addr=0x00000002

  • process killed ***

The dbx stack trace shows similarly:

$ dbx /usr/local/oracle/product/7.3/bin/otrcrep core Reading symbolic information for otrcrep core file header read successfully
Reading symbolic information for rtld /usr/lib/ld.so.1 dbx: program is not active
Reading symbolic information for libsocket.so.1 Reading symbolic information for libnsl.so.1 Reading symbolic information for libdl.so.1 Reading symbolic information for libposix4.so.1 Reading symbolic information for libc.so.1 Reading symbolic information for libmp.so.2 Reading symbolic information for libaio.so.1 Reading symbolic information for libc_psr.so.1 program terminated by signal SEGV (no mapping at the fault address) (dbx) where
=>[1] epclbpgrp(0x89418, 0x0, 0x1, 0x0, 0x81010100, 0xff0000), at 0x23e38
  [2] epcrp_facrg_rec(0xe4450, 0x885c4, 0x19, 0xdb488, 0xdb478, 0xefffe824), at 0x20024

  [3] epcrp_cdf(0x88fc0, 0x88714, 0x858c0, 0xe4450, 0x1, 0x1), at 0x1b684
  [4] epc__report_main(0xefffed0c, 0x0, 0x3f, 0x50, 0x1, 0x5074), at 0x18e28
  [5] main(0x3, 0xefffeac4, 0xefffed0c, 0x50, 0x3f, 0x0), at 0x18c80
(dbx)

System info:

$ uname -a
SCO_SV storesrv 3.2 5.0.5 i386

From the sqlplus command:

SQL*Plus: Release 3.3.4.0.0 - Production on Wed Jun 9 15:31:34 1999 Copyright (c) Oracle Corporation 1979, 1996. All rights reserved. Connected to:
Oracle7 Server Release 7.3.4.3.0 - Production With the distributed and parallel query options PL/SQL Release 2.3.4.3.0 - Production

From $ORACLE_BASE/log/alert_*.log :

System parameters with non-default values:

  processes                = 50
  timed_statistics         = TRUE
  shared_pool_size         = 10000000
  shared_pool_reserved_min_alloc= 8192
  pre_page_sga             = TRUE
  control_files            = /usr/local/oracle/storesrv/system/vncntl01.dbf,
/usr/local/oracle/storesrv/p2000/index1/vncntl02.dbf
  compatible               = 7.3.4.0.0
  log_buffer               = 8192
  log_checkpoint_interval  = 999999
  max_rollback_segments    = 50
  rollback_segments        = rbs_01, rbs_02, rbs_03
  sequence_cache_hash_buckets= 10
  remote_login_passwordfile= NONE
  mts_service              = storesrv
  mts_servers              = 0
  mts_max_servers          = 0
  mts_max_dispatchers      = 0
  audit_trail              = NONE
  sort_area_size           = 1048576
  sort_area_retained_size  = 1048576
  sort_direct_writes       = AUTO
  db_name                  = storesrv
  open_cursors             = 255
  ifile                    = $ORACLE_BASE/storesrv/init_store_tun.ora
  os_authent_prefix        = OPS$
  optimizer_mode           = RULE
  cursor_space_for_time    = TRUE
  shadow_core_dump         = partial
  background_core_dump     = partial
  background_dump_dest     = /usr/local/oracle/log
  user_dump_dest           = /usr/local/oracle/log
  core_dump_dest           = /usr/local/oracle/log
  audit_file_dest          = /usr/local/oracle/log
  oracle_trace_enable      = TRUE

  oracle_trace_facility_path= /usr/local/oracle/log   oracle_trace_collection_path= /usr/local/oracle/log

cheers,

--
Simon McClenahan
Computer Consultant Extraordinaire
BALR Corporation http://www.balr.com
+1(630)575-8200
Received on Wed Jun 09 1999 - 23:00:10 CEST

Original text of this message