Re: troubleshooting slow I/O performance.

From: Mladen Gogala <gogala.mladen_at_gmail.com>
Date: Thu, 10 May 2018 00:17:44 +0000
Message-ID: <CALcG2DJ5ez1X4J56u_UBNcz5qPUO14SupS7AAELid7P2jz5iBg_at_mail.gmail.com>



Well, I took the original poster's diagnosis as a given. If you think that it may warrant more investigation, go ahead. Without access to the cluster I cant help with that

Mladen Gogala

On Wed, May 9, 2018, 3:27 PM Thomas Roach <troach_at_gmail.com> wrote:

> But it disk latency the source of his performance problems? Like you said,
> he needs to look at that individual workload and break it down. It could
> just as easily be a bad execution plan. I see nested loop joins bite people
> and they are focused on dB file sequential reads when it shouldn’t be doing
> that many in the first place.
>
> Sent from my iPhone
>
> > On May 9, 2018, at 3:48 PM, Stefan Koehler <contact_at_soocs.de> wrote:
> >
> > Hi Mladen,
> > but this is exactly my point. The OP just knows that the average IO for
> a single block IO request takes 10 ms but he does not know where the 10 ms
> come from and what this average response time is made of (latency
> histogram).
> >
> > These 10 ms can be lost anywhere in the IO stack and may also be load
> dependent - especially as he uses SLOB. IMHO it is not feasible to make any
> statements about the "disk response time" without knowing the SAN storage
> sub-system cache, its type (e.g. XIV cache works different than DS8000
> cache) and its size as almost every storage sub-system also has some
> read/write cache to support the disks. In addition it is most likely that
> the generated and used data set (by SLOB) is smaller than the storage
> sub-system cache and so most of the IO is not going to the spinning disk
> anyway after a few iterations :)
> >
> > How can you know that these 10 ms are not caused by a maxed out FC HBA,
> maxed out SAN tunneling port or just some IO outliers (majority of IOs may
> just take 2 ms or so but a few IO outliers with 2 seconds or up to SCSI
> timeout, etc.)?
> >
> > For example blktrace would help him in case of such IO outliers.
> >
> > The information about the SSDs was posted after my mail. However it is
> not surprising that SSDs perform better than spinning disk - maybe the few
> IO outliers are just not soooo bad with SSDs when the storage-subsystem
> needs to go to the disk (storage sub-system cache does not have the block
> in cache) ;-)
> >
> > Best Regards
> > Stefan Koehler
> >
> > Independent Oracle performance consultant and researcher
> > Website: http://www.soocs.de
> > Twitter: _at_OracleSK
> >
> >> Mladen Gogala <gogala.mladen_at_gmail.com> hat am 9. Mai 2018 um 19:09
> geschrieben:
> >>
> >> Hi Stefan,
> >>
> >> My understanding of the facts is the following:
> >>
> >> * SLOB established the fact that the average single block read takes
> 10ms to complete.
> >> * 10ms is not fast enough.
> >>
> >> From those two facts I conclude that the OP needs faster disks. It's as
> simple as that. The OP has also said that he has some flash disk groups
> which are much faster. Please let me know if my understanding of the facts
> is incorrect. Also, what insight can the OP gain from the rather strenuous
> exercise with blktrace and how can it help him?
> >>
> >> He used SLOB and has his results. My understanding is that SLOB results
> are taken as facts. So, we can take an average of 10ms for a single block
> read as a fact. If that is fast enough, all is well, nothing needs to be
> done. If not, the only way to fix things are faster disks. Did I go wrong
> somewhere?
> > --
> > http://www.freelists.org/webpage/oracle-l
> >
> >
>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu May 10 2018 - 02:17:44 CEST

Original text of this message