Re: Oracle home headscratcher

From: Tim Gorman <tim.evdbt_at_gmail.com>
Date: Wed, 19 Feb 2020 14:26:27 -0800
Message-ID: <9329ef07-8509-7442-3051-e807573461d0_at_gmail.com>



In future, either the "fuser" or "lsof" utilities would have listed files opened along with the process that opened them, based on a directory or a filesystem, just FYI.

Solaris typically includes "fuser" on install, but "lsof" can be downloaded from SunFreeware
<http://www.sunfreeware.com/programlistsparc10.html#lsof>.

On 2/19/2020 11:09 AM, Newman, Christopher wrote:
>
> Thanks Chris, Rajeev, J, William and Iggy- turns out it was in fact
> the sticky bit somehow got flipped on one of the files.  We copied a
> fresh home over and are in good shape.  How it got flipped is
> something we’ll continue to pursue.
>
> Thanks again! - Chris
>
> *From:*Iggy Fernandez <iggy_fernandez_at_hotmail.com>
> *Sent:* Wednesday, February 19, 2020 12:21 PM
> *To:* William Beldman <wbeldma_at_uwo.ca>; oracle-l_at_freelists.org;
> Newman, Christopher <cjnewman_at_uillinois.edu>
> *Subject:* Re: Oracle home headscratcher
>
> truss would have diagnosed the issue. sqlplus is a frontend so you
> would either have to run truss directly against the child oracle
> process or use "truss -f sqlplus ..." to trace child processes. -c
> produces a summary.
>
> *–c***
>
> Counts traced system calls, faults, and signals rather than displaying
> the trace line-by-line. A summary report is produced after the traced
> command terminates or when truss is interrupted. If –f is also
> specified, the counts include all traced system calls, faults, and
> signals for child processes.
>
> *The Northern California Oracle Users Group is a volunteer-run
> 501(c)(3) organization that has been serving the Oracle Database
> community of Northern California for more than thirty years by
> organizing four conferences a year and publishing a quarterly journal.
> Download the complete digital archive of the NoCOUG Journal using the
> Linux command: “wget
> www.nocoug.org/Journal/NoCOUG_Journal_{2001..2019}{02..12..3}.pdf
> <http://www.nocoug.org/Journal/NoCOUG_Journal_%7b2001..2019%7d%7b02..12..3%7d.pdf>”.*
>
> ------------------------------------------------------------------------
>
> *From:*oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org> <oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org>> on behalf of Newman,
> Christopher <cjnewman_at_uillinois.edu <mailto:cjnewman_at_uillinois.edu>>
> *Sent:* Tuesday, February 18, 2020 6:40 PM
> *To:* William Beldman <wbeldma_at_uwo.ca <mailto:wbeldma_at_uwo.ca>>;
> oracle-l_at_freelists.org <mailto:oracle-l_at_freelists.org>
> <oracle-l_at_freelists.org <mailto:oracle-l_at_freelists.org>>
> *Subject:* RE: Oracle home headscratcher
>
> Yes, that didn’t turn up much.  Unfortunately we’ve rebooted the
> server (thankfully DEV) and the problem has gone away.
>
> What we did notice is that the shutdown scripts, which include sqlplus
> calls to shutdown each database, worked fine.  That script was called
> by root of course, so now we’re thinking it’s something to do with the
> oracle user and either a permission or resource issue.
>
> *From:*William Beldman <wbeldma_at_uwo.ca <mailto:wbeldma_at_uwo.ca>>
> *Sent:* Tuesday, February 18, 2020 8:17 PM
> *To:* Newman, Christopher <cjnewman_at_uillinois.edu
> <mailto:cjnewman_at_uillinois.edu>>; oracle-l_at_freelists.org
> <mailto:oracle-l_at_freelists.org>
> *Subject:* RE: Oracle home headscratcher
>
> Can you run truss against sqlplus/tnsping/etc. to figure out what it’s
> doing over the course of those 10 minutes?
>
> *From:*oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org> <oracle-l-bounce_at_freelists.org
> <mailto:oracle-l-bounce_at_freelists.org>> *On Behalf Of *Newman, Christopher
> *Sent:* February 18, 2020 6:38 PM
> *To:* oracle-l_at_freelists.org <mailto:oracle-l_at_freelists.org>
> *Subject:* Oracle home headscratcher
>
> Hi All,
>
> We’ve got multiple Oracle homes on a Solaris 11.4 server (T8 SPARC). 
> We are having issues with a single home (12.2.0.1), while others are
> fine (19.5, a different 12.2.0.1 home).  We haven’t seen this problem
> on any other hosts, and no known modifications to the environment
> happened prior to the behavior we’re seeing.
>
> Sqlplus appears to hang, but does eventually connect (by eventually,
> I’m talking 10+ minutes, and a local connection).
>
> This behavior extends to tnsping (times out, we traced but didn’t get
> much), but running opatch for example, is not affected.
>
> Standby database on the system fall behind.
>
> External connections to databases are not impacted; only attempting to
> run the binaries locally from the problematic home exhibit the symptoms.
>
> Our only clue on the host  is very high utilization of our /u01 mount
> point, but so far our Unix crew hasn’t been able to isolate which
> process is driving the IO.
>
> Yesterday, on a whim we switched the problematic Oracle home
> permissions to 755 (from 700), and things “magically” worked and IO
> plummeted instantly.
>
> Today, we switched back to 700 to see if we could break thing again;
> we did.  However in this second case, chmod’ing the problematic home
> back to 755 had zero effect and the hanging behavior persists.
>
> Any thoughts on what to look at next?  Again, the problem is isolated
> to just this single home.
>
> Thanks- Chris
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Feb 19 2020 - 23:26:27 CET

Original text of this message