Re: troubleshooting slow I/O performance.

From: Mladen Gogala <gogala.mladen_at_gmail.com>
Date: Thu, 10 May 2018 00:08:39 -0400
Message-ID: <0798dc82-98b8-a24f-2fa5-8afdb82c2646_at_gmail.com>


Hi Stefan,

The OP doesn't even have an application. Discussing whether 10ms is good enough or is not good enough is somewhat moot point until the database has the real users, applications and data. There is an old platitude that database without data and users has never caused any problems.

As far as I can see, latency histogram and the number components are not very helpful. 10ms per single block read is either OK or not OK. The application can be tuned, almost read-only tables like states, zip codes, addresses and alike can be created as IOT, tables can be clustered around common columns to speed up queries, partitioning can be used, in-memory option can also speed things up significantly, but nothing can be done until the system is used. And, of course, if the disk subsystem is too slow, then there is a hard decision to be made. Diving headlong into Linux guts never did much good for me, although I confess to having tried it several times.

Regards

On 05/09/2018 03:48 PM, Stefan Koehler wrote:
> Hi Mladen,
> but this is exactly my point. The OP just knows that the average IO for a single block IO request takes 10 ms but he does not know where the 10 ms come from and what this average response time is made of (latency histogram).
>
> These 10 ms can be lost anywhere in the IO stack and may also be load dependent - especially as he uses SLOB. IMHO it is not feasible to make any statements about the "disk response time" without knowing the SAN storage sub-system cache, its type (e.g. XIV cache works different than DS8000 cache) and its size as almost every storage sub-system also has some read/write cache to support the disks. In addition it is most likely that the generated and used data set (by SLOB) is smaller than the storage sub-system cache and so most of the IO is not going to the spinning disk anyway after a few iterations :)
>
> How can you know that these 10 ms are not caused by a maxed out FC HBA, maxed out SAN tunneling port or just some IO outliers (majority of IOs may just take 2 ms or so but a few IO outliers with 2 seconds or up to SCSI timeout, etc.)?
>
> For example blktrace would help him in case of such IO outliers.
>
> The information about the SSDs was posted after my mail. However it is not surprising that SSDs perform better than spinning disk - maybe the few IO outliers are just not soooo bad with SSDs when the storage-subsystem needs to go to the disk (storage sub-system cache does not have the block in cache) ;-)
>
> Best Regards
> Stefan Koehler
>
> Independent Oracle performance consultant and researcher
> Website: http://www.soocs.de
> Twitter: _at_OracleSK
>
>> Mladen Gogala <gogala.mladen_at_gmail.com> hat am 9. Mai 2018 um 19:09 geschrieben:
>>
>> Hi Stefan,
>>
>> My understanding of the facts is the following:
>>
>> * SLOB established the fact that the average single block read takes 10ms to complete.
>> * 10ms is not fast enough.
>>
>> From those two facts I conclude that the OP needs faster disks. It's as simple as that. The OP has also said that he has some flash disk groups which are much faster. Please let me know if my understanding of the facts is incorrect. Also, what insight can the OP gain from the rather strenuous exercise with blktrace and how can it help him?
>>
>> He used SLOB and has his results. My understanding is that SLOB results are taken as facts. So, we can take an average of 10ms for a single block read as a fact. If that is fast enough, all is well, nothing needs to be done. If not, the only way to fix things are faster disks. Did I go wrong somewhere?

-- 
Mladen Gogala
Database Consultant
Tel: (347) 321-1217

--
http://www.freelists.org/webpage/oracle-l
Received on Thu May 10 2018 - 06:08:39 CEST

Original text of this message