Return-Path: <oracle-l-bounce@freelists.org>
Delivered-To: 2-oracle-l@orafaq.com
Received: (qmail 20947 invoked from network); 21 Sep 2007 20:16:33 -0500
Received: from freelists-180.iquest.net (HELO turing.freelists.org) (206.53.239.180)
  by 69.64.49.119 with SMTP; 21 Sep 2007 20:16:33 -0500
Received: from localhost (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 82E2C7609E0;
 Fri, 21 Sep 2007 21:16:33 -0400 (EDT)
Received: from turing.freelists.org ([127.0.0.1])
 by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 23233-08; Fri, 21 Sep 2007 21:16:33 -0400 (EDT)
Received: from turing (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id E095676322F;
 Fri, 21 Sep 2007 21:16:32 -0400 (EDT)
Received: with ECARTIS (v1.0.0; list oracle-l); Fri, 21 Sep 2007 20:31:01 -0400 (EDT)
Received: from localhost (localhost [127.0.0.1])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id ED1A27635BF
 for <oracle-l@freelists.org>; Fri, 21 Sep 2007 20:31:00 -0400 (EDT)
Received: from turing.freelists.org ([127.0.0.1])
 by localhost (turing.freelists.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 08078-06 for <oracle-l@freelists.org>;
 Fri, 21 Sep 2007 20:31:00 -0400 (EDT)
Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.187])
 by turing.freelists.org (Avenir Technologies Mail Multiplex) with ESMTP id 61ACC76343E
 for <oracle-l@freelists.org>; Fri, 21 Sep 2007 20:31:00 -0400 (EDT)
Received: by nf-out-0910.google.com with SMTP id 4so756725nfv
        for <oracle-l@freelists.org>; Fri, 21 Sep 2007 17:30:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=beta;
        h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references;
        bh=41a1+Z5oprnsDyDrmC8LoBkczx4oac+X4vGirKiOAfk=;
        b=ZjnJfW4o+9NoEDnXOOkMVfDlykvmWNsbQiUbrN8mSZDHp11Sfgsz5oK+PMpP+hcRSWKB+9PXBeBjdiT2QH6fEnjN6M+/HhOn253t3qvbGX5Y7DtSCgyMBkablwCXqFEJ1NYJH6lfPkQshFSnc6JLSPYfLzriTS/axR4QEowv5dE=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=beta;
        h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references;
        b=j3OV4NrjGqq+yKBg6uDLV8vxEw6MxDV700vCCxvxeINkg1lhWoDI4BkQDNYMnF+D3Vdos2UcoRyrv2xSYI+D989h/wB4Kb7rIMhMySWbTThDqvSIfOax1/qu5wqj2z8tKopNNZpAcbx3NWEoadpMdAYRD+C8xvPidJ5y2aNL6+I=
Received: by 10.78.172.20 with SMTP id u20mr2504990hue.1190421058658;
        Fri, 21 Sep 2007 17:30:58 -0700 (PDT)
Received: by 10.78.181.10 with HTTP; Fri, 21 Sep 2007 17:30:58 -0700 (PDT)
Message-ID: <2ba656800709211730g48212570jde71fdfaec7282c3@mail.gmail.com>
Date: Fri, 21 Sep 2007 20:30:58 -0400
From: "Rajeev Prabhakar" <rprabha01@gmail.com>
To: don@seiler.us
Subject: Re: Weird database hanging
Cc: oracle-l@freelists.org
In-Reply-To: <716f7a630709211353o628b1ed9pc603ffa90c0b53e3@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_Part_34871_18272744.1190421058647"
References: <20070915071419.33C63743CB5@turing.freelists.org>
	 <716f7a630709170850w66347a8fr6f1099fbb1bda3e5@mail.gmail.com>
	 <716f7a630709170938h3403bb0dv32110c5893cf998@mail.gmail.com>
	 <716f7a630709181520p7ee8f2f4m9d4bfa7207c65d47@mail.gmail.com>
	 <c2213f680709181650y4b982fe7n2d3375a2379b0efb@mail.gmail.com>
	 <716f7a630709181902t73b910a1re2b516e9186101ca@mail.gmail.com>
	 <c2213f680709181942l79088c71x7d4140109a584509@mail.gmail.com>
	 <716f7a630709200657t5a79cbc0q6a1fb1281f087cde@mail.gmail.com>
	 <9f0e18730709211302m3aa71e29h2ab0673b55a6d4b7@mail.gmail.com>
	 <716f7a630709211353o628b1ed9pc603ffa90c0b53e3@mail.gmail.com>
X-archive-position: 1810
X-ecartis-version: Ecartis v1.0.0
Sender: oracle-l-bounce@freelists.org
Errors-to: oracle-l-bounce@freelists.org
X-original-sender: rprabha01@gmail.com
Precedence: normal
Reply-to: rprabha01@gmail.com
List-help: <mailto:ecartis@freelists.org?Subject=help>
List-unsubscribe: <oracle-l-request@freelists.org?Subject=unsubscribe>
List-software: Ecartis version 1.0.0
List-Id: oracle-l <oracle-l.freelists.org>
X-List-ID: oracle-l <oracle-l.freelists.org>
List-subscribe: <oracle-l-request@freelists.org?Subject=subscribe>
List-owner: <mailto:steve.adams@ixora.com.au>
List-post: <mailto:oracle-l@freelists.org>
List-archive: <http://www.freelists.org/archives/oracle-l>
X-list: oracle-l
X-Virus-Scanned: Debian amavisd-new at localhost.localdomain
------=_Part_34871_18272744.1190421058647
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Don

That's interesting....I want to share our experience just in case it
helps anyone..

While conducting stress tests against a two node 10.2.0.3 rac/asm/
SAN based database, we were facing near freeze / hang besides the
(ORA-3136) error and ipc timeouts followed by node evictions.

So, we tried all the recommended things. Bumped up sqlnet/listener
timeouts, sessions/processes/pga_aggregate_target, shared pool
size etc.. without any luck. The near freeze/hang continued beyond
a particular number of concurrent database sessions. We doubly
checked our o.s. params etc just in case...but it didn't help.

Later, we decided to increase swap space (given some low available
swap space observed during these tests even when memory was
available) and we have found that post  increase, the database hangs/
node evictions didn't occur any more AND the load tests completed
the allocated window. Although, concurrency continued to be the #1
wait during these window, but all our instances(db/asm) survived the
load test.

Now, it is quite possible that we haven't fixed the root cause and
this is just a distraction/giving us a temporary breather.

Anyway, if we find something later (e.g. a bug etc.), I'll let everyone
know..

-Rajeev

On 9/21/07, Don Seiler <don@seiler.us> wrote:
>
> We *think* we have found the issue, and it isn't quite Oracle-related
> (of course).
>
> The SA had been doing a Veritas online relayout on the disk partition
> that is our archivelog destination.  He aborted it, but rather than
> aborting, Veritas left it in a "paused" state.  This happend 20
> minutes before the bulk load that caused our first instance hang.
> Note that we *were* able to archive logs, it just seemed to have
> caused some more waiting than normal.  This was compounded during bulk
> loads, and in the end caused a crush of shared pool and library cache
> latches.
>
> This situation was discovered yesterday and the times seemed all too
> coincidental.  The state was corrected and we've been happily bulk
> loading anything and everything since then.
>
> In the end, we recognize there is plenty of room for improvement in
> the application code (and horrible inefficiencies in the app database
> design), but were quite certain that wasn't the root cause of this
> problem.  I'm still pretty upset with Oracle support over their
> blinders and insistence that the problem was "properly diagnosed" and
> ignored all of my input and feedback.
>
> Don.
>
> --
> Don Seiler
> oracle: http://ora.seiler.us
> ultimate: http://www.mufc.us
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

------=_Part_34871_18272744.1190421058647
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

<div>Don</div>
<div>&nbsp;</div>
<div>That&#39;s interesting....I want to share our experience just in case it</div>
<div>helps anyone..</div>
<div>&nbsp;</div>
<div>While conducting stress tests against a two node <a href="http://10.2.0.3">10.2.0.3</a> rac/asm/</div>
<div>SAN based database, we were facing near freeze / hang besides the&nbsp;</div>
<div>(ORA-3136) error and ipc timeouts followed by node evictions.</div>
<div>&nbsp;</div>
<div>So, we&nbsp;tried all the recommended things. Bumped up sqlnet/listener </div>
<div>timeouts, sessions/processes/pga_aggregate_target, shared pool </div>
<div>size&nbsp;etc.. without any luck. The near freeze/hang continued beyond</div>
<div>a particular number of concurrent database sessions. We doubly</div>
<div>checked our o.s. params etc just in case...but it didn&#39;t help.</div>
<div>&nbsp;</div>
<div>Later,&nbsp;we decided to increase swap space&nbsp;(given some low available</div>
<div>swap space observed during these tests even when memory was </div>
<div>available) and we have found&nbsp;that post&nbsp; increase, the database hangs/</div>
<div>node evictions didn&#39;t occur any more AND&nbsp;the load tests completed </div>
<div>the allocated window.&nbsp;Although, concurrency continued to be the #1 </div>
<div>wait during these window, but all our instances(db/asm)&nbsp;survived the </div>
<div>load test.</div>
<div>&nbsp;</div>
<div>Now,&nbsp;it is quite possible that we haven&#39;t fixed the root cause and </div>
<div>this is just a distraction/giving us a temporary breather.&nbsp;</div>
<div>&nbsp;</div>
<div>Anyway, if we find something later (e.g. a bug etc.), I&#39;ll let everyone </div>
<div>know..</div>
<div>&nbsp;</div>
<div>-Rajeev<br>&nbsp;</div>
<div><span class="gmail_quote">On 9/21/07, <b class="gmail_sendername">Don Seiler</b> &lt;<a href="mailto:don@seiler.us">don@seiler.us</a>&gt; wrote:</span>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">We *think* we have found the issue, and it isn&#39;t quite Oracle-related<br>(of course).<br><br>The SA had been doing a Veritas online relayout on the disk partition
<br>that is our archivelog destination.&nbsp;&nbsp;He aborted it, but rather than<br>aborting, Veritas left it in a &quot;paused&quot; state.&nbsp;&nbsp;This happend 20<br>minutes before the bulk load that caused our first instance hang.<br>
Note that we *were* able to archive logs, it just seemed to have<br>caused some more waiting than normal.&nbsp;&nbsp;This was compounded during bulk<br>loads, and in the end caused a crush of shared pool and library cache<br>latches.
<br><br>This situation was discovered yesterday and the times seemed all too<br>coincidental.&nbsp;&nbsp;The state was corrected and we&#39;ve been happily bulk<br>loading anything and everything since then.<br><br>In the end, we recognize there is plenty of room for improvement in
<br>the application code (and horrible inefficiencies in the app database<br>design), but were quite certain that wasn&#39;t the root cause of this<br>problem.&nbsp;&nbsp;I&#39;m still pretty upset with Oracle support over their<br>
blinders and insistence that the problem was &quot;properly diagnosed&quot; and<br>ignored all of my input and feedback.<br><br>Don.<br><br>--<br>Don Seiler<br>oracle: <a href="http://ora.seiler.us">http://ora.seiler.us</a>
<br>ultimate: <a href="http://www.mufc.us">http://www.mufc.us</a><br>--<br><a href="http://www.freelists.org/webpage/oracle-l">http://www.freelists.org/webpage/oracle-l</a><br><br><br></blockquote></div><br>

------=_Part_34871_18272744.1190421058647--
--
http://www.freelists.org/webpage/oracle-l


