Re: ORACLE on Linux - IO bottleneck

From: Noons <wizofoz2k_at_yahoo.com.au>
Date: 12 Feb 2006 01:45:52 -0800
Message-ID: <1139737552.162989.310850@g44g2000cwa.googlegroups.com>

Mladen Gogala wrote:
> Bear in mind that at some point all those interrupt requests will start
> saturating your system bus. PC buses, even the server versions, are
> nowhere near the capacity of the true midrange SMP servers like HP 9000
> series or IBM P960 series machines. Modern disk controllers will do
> massive amounts of DMA communication with memory, using the same system
> bus that CPU boards use to synchronize caches, that network controllers
> use to notify CPU of the network interrupts and that all peripheral
> devices use to communicate with CPU and RAM. My experience tells me that
> no matter how good PC server you buy, you will never get more then 2500
> I/O operations per second out of it. On a heavily used OLTP database, that
> amounts to 200-300 concurrent users, with up to 50 active at a time.

That's a superb point, almost lost it in all the replies. One has to keep in perspective that we're not talking about "super server" technology here. A PC blade is a PC architecture, not a SMP server on steroids, ht or dual core notwithstanding.

Linux+PC blades have a purpose and a market sweet spot. They can be made to perform at unheard of levels only a few years ago. But it's only too easy to spend more $$$ making them behave like a midrange SMP database server than to actually BUY such a server!

Like everything, it's all about balance. PC-based servers with Linux can be configured at astoundingly cheap prices. That doesn't mean they can perform with databases at the same levels as much more sophisticated (read: expensive) architectures. For some situations the PC-based solution is perfect. Others require more elaborate solutions. It's all about price-performance, bang-for-buck and all that jazz.

The 2500 IO/sec is IME as well a good yardstick, assuming nothing else is sapping the bandwidth. Network controllers, memory-to-memory copies and all such can reduce this significantly.

That's real direct IO, not cache accesses! This is not the same as rates of IO. Our PC blades drive SAN boxes at over 100MB/sec. That's however with IO request scatter-gather on full table scans, dbfmbrc, SAN read-ahead and all such streaming optimisations.

Discrete random IO operations are a totally different animal!

2500/sec is around 20MB/s, assuming 8K db blocks. Anyone getting as much as that in direct random IO, mixed access in a PC blade architecture, can count themselves very lucky indeed.

Ours sometimes go as high as 40MB/s, or 5000 random IO/sec. But that's in favourable conditions, wind coming from behind, moon in the right quarter and all such! Any requirement for higher than that and we definitely enter into diminishing returns territory.

One thing I've been able to determine so far: number of controllers and access paths do definitely make an almost linear difference. There is a reason why you see those hundreds of controllers and thousands of disks in the hardware descriptions of OLTP workload benchmarks!

> Second, Linux doesn't show you the time spent on the interrupt stack.
> You cannot see whether your motherboard is loaded to the capacity or not,
> because you cannot see how much of the system time is actually spent
> servicing interrupts.

Yup, very much so. It's very hard (on current levels of Linux) without some specialised driver/monitor software to determine exactly where problems are and how to address them. All we can do is devise tests, compare configs and finally extrapolate from the results what is really going on and what is the best course of action.

Thanks for the heads up, Mladen. Very interesting to see others with same experiences I'm going through. Received on Sun Feb 12 2006 - 03:45:52 CST