RE: troubleshooting slow I/O performance.

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Wed, 9 May 2018 10:10:51 -0400
Message-ID: <054c01d3e79f$89fa60a0$9def21e0$_at_rsiz.com>



This was a delightful thread full of good information.  

I would contend, however, that Mladen defined the “most dangerous” rather than the “most interesting” DBA in his comment.  

The most interesting DBA in the world is the one who can remain calm and dig down to the truth and thereby help determine the best economic path to improvement when handed a system already built that is overstressed and underperforming.  

In addition to seeing what is important and horrible that could possibly be logically optimized via Method R (from MethodR), the various hints and tools from this thread regarding io as the pacing resource are dead bang on. As a token remembrance of Terascape, remember that if you have an opportunity to relocate the hottest 10% of acreage of your system to fast, cheap seek hardware, then you may discover you have “de-heated” ™ the rest of your disk farm so it is just fine for the other 90% of your acreage.  

As the price of superfast, super reliable storage drops, the economic edge where figuring this out is worthwhile drops, but it is not (as Stephan pointed out) down to zero yet.  

And yet with the shift to operational rental cost to capital expense continues in the re-emergence of time-sharing, what sells to management may increasingly be Mladen’s solution. If you can get your 2 cents into the thinking bin before the system is configured and delivered.  

We’re living in interesting times. Again.  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Chris Stephens Sent: Wednesday, May 09, 2018 9:49 AM
To: contact_at_soocs.de
Cc: oracle-l_at_freelists.org; gogala.mladen_at_gmail.com Subject: Re: troubleshooting slow I/O performance.  

thanks for everyone's input on this. the 10ms I/O times are relatively constant on this disk group. we were just looking to make sure those disks are performing as expected. I have always considered 5ms single block I/O response times for spinning disks as typical and when I saw 10ms I thought maybe something was up. not sure how i got that 5ms number stuck in my head.  

Luckily we also have a disk group on SSD's that performs far better.  

thanks again everyone. even you and your criticisms mladen. :)  

On Wed, May 9, 2018 at 2:43 AM Stefan Koehler <contact_at_soocs.de> wrote:

Hello Mladen,
sure I agree if you already know that the disks are the bottleneck (too slow) but in this case the OP does not know and want to identify / drill-down to the bottleneck.

Let's summarize the provided information quickly:

- 5 node 12.2 RAC system
- "db file sequential read"'s are taking ~10ms
- ASM (external redundancy)
- 6 x 30TB spinning drives / 8 LUNs via FC

First of all we don't know how the 10 ms (I guess/assume average) are formed. Are most of the IOs in this latency bucket or is the majority of IO much faster and the OP just got a few very bad outliers that lead to this average.

Secondly he is using RAC and so also some kind of shared storage - even typical mid-range storage for such environments got read/write caches which improve the response time of spinning disks drastically. We just don't know how it looks like and it seems like the OP does not either.

IMHO it is better to give the OP a way to figure it out on his own (and blktrace is a good tool to do that) instead of wild guessing and already stating that the root cause of his observation is the "slow" spinning disks.

P.S.: In the past I was able to fix several IO outlier problems with blktrace by the way ;-)

Best Regards
Stefan Koehler

Independent Oracle performance consultant and researcher Website: http://www.soocs.de
Twitter: _at_OracleSK

> Mladen Gogala hat am 9. Mai 2018 um 05:14 geschrieben:
>
> There is rarely need to lose time by doing blktrace. If disks are too
> slow, the ONLY real answer are faster disks. I used to play with IO
> elevators, read ahead, systemtap and stuff like that, but the truth is
> that whenever I delved into that stuff, nothing useful ever came out of
> it. When IO is too slow, the faster disks are the only solution. No
> amount of tuning will turn a Ford Taurus into a Ferrari. Storage
> configuration is usually planned ahead of configuring the database. It
> may even be prudent, I apologize for using harsh language, to do some
> testing ahead of building the whole cluster. This sounds like the
> configuration done by the most interesting DBA in the world: the one who
> doesn't test his stuff often, but when he does so, he does it in production.
>
> --
> Mladen Gogala
> Database Consultant

--
http://www.freelists.org/webpage/oracle-l





--
http://www.freelists.org/webpage/oracle-l
Received on Wed May 09 2018 - 16:10:51 CEST

Original text of this message