Re: troubleshooting slow I/O performance.

From: Stefan Koehler <contact_at_soocs.de>
Date: Wed, 9 May 2018 21:48:05 +0200 (CEST)
Message-ID: <1489069188.416161.1525895286113_at_ox.hosteurope.de>


Hi Mladen,
but this is exactly my point. The OP just knows that the average IO for a single block IO request takes 10 ms but he does not know where the 10 ms come from and what this average response time is made of (latency histogram).

These 10 ms can be lost anywhere in the IO stack and may also be load dependent - especially as he uses SLOB. IMHO it is not feasible to make any statements about the "disk response time" without knowing the SAN storage sub-system cache, its type (e.g. XIV cache works different than DS8000 cache) and its size as almost every storage sub-system also has some read/write cache to support the disks. In addition it is most likely that the generated and used data set (by SLOB) is smaller than the storage sub-system cache and so most of the IO is not going to the spinning disk anyway after a few iterations :)

How can you know that these 10 ms are not caused by a maxed out FC HBA, maxed out SAN tunneling port or just some IO outliers (majority of IOs may just take 2 ms or so but a few IO outliers with 2 seconds or up to SCSI timeout, etc.)?

For example blktrace would help him in case of such IO outliers.

The information about the SSDs was posted after my mail. However it is not surprising that SSDs perform better than spinning disk - maybe the few IO outliers are just not soooo bad with SSDs when the storage-subsystem needs to go to the disk (storage sub-system cache does not have the block in cache) ;-)

Best Regards
Stefan Koehler

Independent Oracle performance consultant and researcher Website: http://www.soocs.de
Twitter: _at_OracleSK

> Mladen Gogala <gogala.mladen_at_gmail.com> hat am 9. Mai 2018 um 19:09 geschrieben:
>
> Hi Stefan,
>
> My understanding of the facts is the following:
>
> * SLOB established the fact that the average single block read takes 10ms to complete.
> * 10ms is not fast enough.
>
> From those two facts I conclude that the OP needs faster disks. It's as simple as that. The OP has also said that he has some flash disk groups which are much faster. Please let me know if my understanding of the facts is incorrect. Also, what insight can the OP gain from the rather strenuous exercise with blktrace and how can it help him?
>
> He used SLOB and has his results. My understanding is that SLOB results are taken as facts. So, we can take an average of 10ms for a single block read as a fact. If that is fast enough, all is well, nothing needs to be done. If not, the only way to fix things are faster disks. Did I go wrong somewhere?

--
http://www.freelists.org/webpage/oracle-l
Received on Wed May 09 2018 - 21:48:05 CEST

Original text of this message