Re: Wait Event - enq: IV - contention (EBS flavored)

From: Biju Thomas <biju.thomas_at_gmail.com>
Date: Tue, 27 Mar 2018 20:48:34 +0000
Message-ID: <CAF9tyKpOQ8gg2TcdhY=vd51hz5sKtuKNk4RncbwtO+tWAw0kqQ_at_mail.gmail.com>



Thank you, Mark. The job is in the "Running" status on EBS side.

The job name is "Assign Territory Accesses" (ASTATA). When they run the job with first parameter TOTAL, it always completes successfully. When the parameter is NEW, it has this problem inconsistently. When the job completes, it finishes in few minutes.

How do I find out which RAC node the job ran? I have a query to find the node (and session details) of the currently running requests but do know how to find which database node a job ran from history.

Biju

On Tue, Mar 27, 2018 at 2:51 PM Mark W. Farnham <mwf_at_rsiz.com> wrote:

> EBS concurrent programs have famously different execution times based on
> the
> volume of data in a given request and some request sets have defined
> incompatibilities with other jobs where they wait until no "incompatible"
> jobs are running. Some jobs are even "runs alone" which I'm thinking this
> one is not.
>
> If you look in the concurrent manager queue reports, is the job actually
> starting, or is the contention wait for the query which looks for jobs
> eligible to run?
>
> The single step request might even be to a different queue than the request
> set (which could be waiting for a queue already at its maximum number of
> workers), and there can be a race condition, especially on RAC.
>
> Are you, or is some person you know the person who configures concurrent
> manager queues. I could of course be completely off target on this if your
> job has actually moved to the "running" stage as you can determine from the
> concurrent manager queries and reports. Then the contention is the job
> itself rather than the infrastructure to queue and eventually allow the job
> to run "concurrently."
>
> Good luck.
>
> Hint: If all your updating jobs can be limited to queues running on a
> single
> instance (that is, if there is sufficient horsepower on a single node,
> which
> is often the case) and the report only jobs can be set up on queues that
> run
> on all the other instances, you may be surprised in a good way at the
> effect
> of GCC application traffic being unidirectional. (Make sure someone knows
> how to quickly move that queue to be eligible for another instance when it
> is time for preventive maintenance on the "update" node, because that is
> inevitable.)
>
> YMMV. And it has been a number of years since I personally configured CCMGR
> queues at all, let alone on a server complex under any stress.
>
> There are several folks on this list who are much more up to date on EBS
> performance challenges. If this is a "standard" job preconfigured as
> delivered by Oracle, it might be helpful to quote its name. EBS is
> generally
> wonderful, but from time to time performance challenges are driven at
> customer sites that do not show up at the mother ship. (Plus the fellow
> [last I knew] who tunes the Oracle EBS implementation is off the charts
> brilliant and has similar folks easily at hand to work with collegially,
> which means they sometimes miss latent problems because of how well
> configured they are or fix them as "not a bug" without batting an eye.)
>
> mwf
>
> -----Original Message-----
> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org]
> On Behalf Of Jonathan Lewis
> Sent: Tuesday, March 27, 2018 3:11 PM
> To: oracle-l_at_freelists.org
> Subject: Re: Wait Event - enq: IV - contention
>
>
> Biju,
>
> You say "databsae reboot".
> Is this restarting all 4 instances in turn, or stopping all 4 then
> restarting them, or just restarting the stuck instance ?
>
> IV is supposed to be "synchronising library cache invalidation" - so the
> randomness may be due to the luck of which instance a process starts to run
> on; and if you don't stop all the instances before restarting that might
> explain why the problem can persist across restarts. This is all
> conjecture
> of course, trying to prompt some ideas that I would then check if I had my
> hands on the system.
>
>
> Regards
> Jonathan Lewis
>
> ________________________________________
> From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on
> behalf of Biju Thomas <biju.thomas_at_gmail.com>
> Sent: 27 March 2018 19:56
> Cc: oracle-l_at_freelists.org
> Subject: Re: Wait Event - enq: IV - contention
>
> Thank you, Jonathan. The user canceled and resubmitted the program and it
> completed. I checked SGA resize ops, and do not see many resize operations,
> and there was none during when the program was stuck last time. The odd
> part
> is that the problem persists even after a database reboot. Not consistent,
> on an average 1, out of 8 tries get stuck.
>
> This is an EBS concurrent program. The program has more frequency of
> getting
> stuck if the user submits it as part of a "request set". User says it gets
> stuck almost all the time. If submitted as a single request, it completes
> most of the time. Still investigating other behaviors.
>
> - Biju
>
>
> On Tue, Mar 27, 2018 at 12:52 PM Jonathan Lewis
> <jonathan_at_jlcomp.demon.co.uk<mailto:jonathan_at_jlcomp.demon.co.uk>> wrote:
>
> If you convert decimal to Hexadecimal to ASCII:
>
> P1 is just a repeat of the lock information, reading IV, mode 5
> P2 is supposed to be an object_id for the IV lock, but yours actually reads
> "SYNC"
> P3 = 3
>
> Which node is the process running on ? The IV is supposed to synchronising
> library cache invalidation across instances.
> Perhaps P3 = 3 is a large-scale function, perhaps it's saying that this
> instance is waiting for instance 3 (which might be 4 depending on whether
> Oracle is counting from 0 or 1 in this case).
>
> One guess: do you see SGA resizes at the time this is happening - perhaps
> one instance starts resizing and locks up the other instances as it does
> so.
>
> Regards
> Jonathan Lewis
>
>
> ________________________________________
> From: oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>
> <oracle-l-bounce_at_freelists.org<mailto:oracle-l-bounce_at_freelists.org>> on
> behalf of Biju Thomas <biju.thomas_at_gmail.com<mailto:biju.thomas_at_gmail.com
> >>
> Sent: 27 March 2018 18:40
> To: oracle-l_at_freelists.org<mailto:oracle-l_at_freelists.org>
> Subject: Wait Event - enq: IV - contention
>
> Trying to troubleshoot a stuck program. The program gets stuck at a
> particular wait event 1 out of every 8 runs on an average. When the program
> completes, it finishes in few minutes. When it is stuck, the P1, P2 values
> are always the same. Can you please tell what the P1, P2, P3 values
> represent.
>
> Current Wait Event enq: IV - contention
> Current Wait Class Other
> P1 type|mode 1230372869
> P2 id1 1398361667
> P3 id2 3
> Object None
>
> They are not OBJECT_IDs.
>
> Version: Oracle database 11.2.0.4
> RAC Database, 4 nodes.
>
> Thanks for your help.
> Biju Thomas
>
> --
>
>
> --
> Best,
> Biju Thomas
> www.bijoos.com<http://www.bijoos.com>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
> --
Best,
Biju Thomas
www.bijoos.com

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Mar 27 2018 - 22:48:34 CEST

Original text of this message