RE: Scheduler Jobs are not distributed according to OS-load on RAC noes
Date: Wed, 12 Dec 2018 16:07:23 -0500
Message-ID: <018401d4925e$c5dea170$519be450$_at_rsiz.com>
BECAUSE YOU DO THIS:
> Every hour we are scheduling a lot of one-time jobs to run a lot of data loads. The Jobs are scheduled by a master which takes care of dependencies - so a job is only scheduled, when all it dependencies are met and should run as soon as resources (job processes) are available. (No dependencies are defined in dbms-scheduler framework).
> The jobs use a JOB_CLASS which as a dedicated SERVICE - this SERVICE is available on all 4 instances. Stop&Start of the service on the "idle" instance does not help.
you have an opportunity to work around what appears to me to be a bug. Here is my proposed flyswatter:
Create four SERVICES, each with one node as primary and the other three as secondary, using round robin in the text.
IF round robin is good enough, just do that by rotating the service name. Unless the primary is down, it should get the job, but you’ve lost no redundancy because the others are listed as secondary. (IF the primary doesn’t get the job when it IS up, that is a serious bug.)
IF round robin is NOT good enough (which can easily be true if a few longer running jobs get allocated to one node), then just before you schedule ping a demon you have running on each node reporting the rolling window average cpu busy and allocate the next job to the node where a) the ping answers and b) the cpu busy is lowest.
Now that is a bit of programming work you shouldn’t have to do, but it is a way to manage round robin and/or cpu load balanced job allocation, given that you are managing releasing these jobs already.
mwf
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Martin Berger
Sent: Wednesday, December 12, 2018 2:25 PM
To: Mladen Gogala
Cc: Niall Litchfield; Oracle-L oracle-l
Subject: Re: Scheduler Jobs are not distributed according to OS-load on RAC noes
Thank you Mladen & Niall,
I set the service to CLB=> short, then restarted the service on one instance which refuses to run jobs - it didn't help. Also set job_queue_processes to 0 (scope=memory) and back to it's original value didn't help at all.
If I find something additional/usefull, I'll share it.
Martin
Am Mi., 12. Dez. 2018 um 14:34 Uhr schrieb Mladen Gogala <gogala.mladen_at_gmail.com>:
You may want to try setting CLB to short
Mladen Gogala
On Tue, Dec 11, 2018, 4:27 AM Martin Berger <martin.a.berger_at_gmail.com wrote:
Hi Niall,
the service has these properties:
Service name: OUR_SERVICE_NAME
Server pool:
Cardinality: 4
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type:
Failover method:
TAF failover retries:
TAF failover delay:
Failover restore: NONE
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Pluggable database name:
Maximum lag time: ANY
SQL Translation Profile:
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
Drain timeout:
Stop option:
Session State Consistency:
GSM Flags: 0
Service is enabled
Preferred instances: INST1,INST2,INST3,INST4
Available instances:
CSS critical: no
it's worth to mention: the connections at which the jobs are scheduled come from another DB via DB-Link.
thank you,
Martin
Am Di., 11. Dez. 2018 um 09:47 Uhr schrieb <niall.litchfield_at_gmail.com>:
Hi Martin
What are the load balancing properties of the service set to?
On Tue, Dec 11, 2018 at 8:45 AM Martin Berger <martin.a.berger_at_gmail.com> wrote:
>
> Hi List,
>
> I have a strange situation with a 4-node RAC - 12.2 (July 2018) Oracle Linux 6.10:
>
> After some time, one (or several) instances stop executing jobs.
>
> Every hour we are scheduling a lot of one-time jobs to run a lot of data loads. The Jobs are scheduled by a master which takes care of dependencies - so a job is only scheduled, when all it dependencies are met and should run as soon as resources (job processes) are available. (No dependencies are defined in dbms-scheduler framework).
> The jobs use a JOB_CLASS which as a dedicated SERVICE - this SERVICE is available on all 4 instances. Stop&Start of the service on the "idle" instance does not help.
> NTP is fine according to cluvfy comp clocksync -n all .
> instance_stickiness is TRUE (the default) - but I don't think this will change anything as our jobs run one-time only.
>
> Does anyone know how to identify, why sometimes some instances refuse to run scheduled jobs?
> Who is doing this decision, and can it be traced somehow to identify based on which numbers the decision is done?
> Any other suggestions?
>
> A SR at MOs is open, but without any progress.
>
> related documents found so far:
>
> DBMS_SCHEDULER job doesn't fail-over across RAC instance ( Doc ID 2365434.1 )
> RAC Node X Is Seeing A Higher Session Load Than The Other Nodes For Scheduler Jobs ( Doc ID 1602581.1 )
> ENH 28592547 - REAL-TIME LOAD BALANCING FOR JOBS ACROSS RAC INSTANCES
>
> --
> Martin Berger Oracle ♠
> martin.a.berger_at_gmail.com _at_martinberx
> ^∆x http://berxblog.blogspot.com
>
-- Niall Litchfield Oracle DBA http://www.orawin.info -- http://www.freelists.org/webpage/oracle-lReceived on Wed Dec 12 2018 - 22:07:23 CET