Re: IO Latency

From: Stefan Knecht <>
Date: Thu, 17 Apr 2008 11:54:02 +0200
Message-ID: <>

Thanks for the feedback everyone!

However, it's not quite *as* simple :-)

The IO waits that I posted were just the peaks occurring in a larger batch.

Some more facts:

  • IO response seems very random in nature (You repeat the same process 5 times, 3 times it runs dead slow (60-100 rows per second), twice it runs lighting fast (3000 rows per second), overall system load was more or less constant during the tests)
  • Out of a typical nightly batch, where 500'000 sequential IOs are performed, only about 5% of the WAITs are in the range of +100ms (and up to 4seconds). They however, account to more than 50% of the total WAIT caused by IO on the batch
  • CPUs are mostly idle, the system is not at all suffering from CPU contention (got 12 CPUs in this LPAR)
  • IBM has performed measurements on the storage subsystem directly. But they've only looked at averages (over 15min periods). Taking the batch from above, where the majority of IOs is very fast, and a few are slow, the average ends up looking rather fast too, due to the high number of fast IOs. Yet total response time is very slow.

What really gets me confused is how an 8k IO can even take up to 4 seconds. Even if the disks are busy and suffering contention, there has still got to be something else causing such enormous latency.

We have also run ORION to measure the IO performance, and it's not excellent, but it's also nowhere near as disastrous (120MB/s throughput with 20 concurrent large IOs, 40ms latency with 30 concurrent small random IOs). That looks "OK" to me.




