RE: i know cary millsap is super smart but...

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Tue, 27 Aug 2013 20:37:02 -0400
Message-ID: <003001cea386$b7890b00$269b2100$_at_rsiz.com>



The underlying principal remains the same: You can run more than strictly the number of cpus in parallel without slowing things down by waiting for cpu to the extent the jobs have to intermittently do other things than computations on the cpu.

Cary's observation at the time was that i/o operations for batch (meaning mostly jobs that do not have to wait for user input or think time) took up about half the real elapsed time of most jobs when run against no competition. So you could indeed bump the number of running batch jobs up to about 2 times the number of available cpus without causing any new cpu wait.

All things being equal and cpus being the most expensive element of the systems of the time, elimination of cpu slack time without increasing cpu waits for any other jobs over time lets you hit something near the highest maximum throughput AND efficiency for a job set.

If, instead, you more than marginally exceed this threshold, you start to rack up waste and inefficiencies of context switches and possibly shuttling more program data on and off chip cache.

Remember to leave available CPU if interactive users will be intruding on the batch window.
Remember this is very different from trying to apply all resources of a machine (or machines) to get the answer to a single question as quickly as possible. (The original design goal of Oracle Parallel Query.) When that is the goal, various resources probably will have slack time when you achieve the fastest solution (but you don't care.)

While the underlying principal remains the same, the fraction of time waiting for i/o operations to complete on either a 100% memory system or a well pipelined SSD (especially non-flash) is likely a lot less than yanking the data from spinning rust. So that mitigates toward lowering the magic number of 2. Depending on whether you're counting cpus or cores and how the threading works for a given combination of hardware and software, you may observe that your job mix stalls on cpu significantly more if you're counting cores and use 2.

2 is still a doggone good starting point.

And you're right: Cary is super smart. More than that Cary is a methodical scientist. (He's also a good friend and an outstanding parent, but that is another story.)

mwf

-----Original Message-----
From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Josh Collier
Sent: Tuesday, August 27, 2013 5:55 PM
To: Cary Millsap; ahmusch_at_gmail.com
Cc: oracle-l_at_freelists.org; lomasky_at_dbasolutionsinc.com Subject: RE: i know cary millsap is super smart but...

I would like to check for understanding: This paper appears to apply very specifically to Oracle Applications Concurrent Manager. It states that one should not run more than 2x*number of CPU batch jobs on an Oracle Applications system (E-business Suite).

The ideas in this paper do not extend to ETL batch jobs and their parallel processes? For example, on a 32 core system, would I be limited to never using more than 64 parallel processes simultaneously?

Thanks for your time,

Josh C.

From: Cary Millsap [mailto:cary.millsap_at_method-r.com] Sent: Friday, June 14, 2013 6:04 PM
To: ahmusch_at_gmail.com
Cc: Josh Collier; oracle-l_at_freelists.org Subject: Re: i know cary millsap is super smart but...

Josh and Adam,

I was just discussing that this week with a client. I've asked the same question, and I just haven't done the tests yet.

My expectation would be that for a two-quad-core system, the number of "effective CPUs" (let's call it) would be something less than 2 x 4 = 8 but more than just 2. Probably 6-ish, I would expect. ...Meaning that on a 2x quad-core system, you could apply the idea behind the paper as if the actual number of CPUs were something like 6.

I'd love to learn what you find out if you test it.

Cary Millsap
Method R Corporation

On Fri, Jun 14, 2013 at 2:37 PM, Adam Musch <ahmusch_at_gmail.com<mailto:ahmusch_at_gmail.com>> wrote: I would think so, from a certain point of view. Each core is reported to the OS as a CPU, and that's what you should use at it pertains to the rule of 2. So if you have two cpus each with four course, your rule of 2 number would be 8.
The underlying mathematics of queuing theory still remain the same.

On Fri, Jun 14, 2013 at 2:08 PM, Josh Collier <Josh.Collier_at_banfield.net<mailto:Josh.Collier_at_banfield.net>>wrote:

> This paper is 13 years old, is it still valid in the era of quad core
> processors?
> Batch Queue Management and the Magic of '2'
> Cary Millsap/Hotsos Enterprises, Ltd.
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

--
Adam Musch
ahmusch_at_gmail.com<mailto:ahmusch_at_gmail.com>


--
http://www.freelists.org/webpage/oracle-l



--
http://www.freelists.org/webpage/oracle-l


--
http://www.freelists.org/webpage/oracle-l
Received on Wed Aug 28 2013 - 02:37:02 CEST

Original text of this message