RE: System stats

From: <"">
Date: Fri, 12 Apr 2019 14:23:57 +0000
Message-ID: <0D8F4CAC0F9D3C4AACC63F50FD9957F762DA89CE_at_PRDTXWPEMLMB32.prod-am.ameritrade.com>



Can you recommend any sort of monitoring to identify when a SAN is getting overloaded? In our case it only became apparent when an app started experiencing latency at the same time for 5-10 minutes every day and we tracked it down to a batch job which was running on an entirely different cluster but which shared the same storage unit. Storage denied it was their problem right up until the point we proved it was.

It would have been nice to have known that before the problems started showing up. Getting a new storage unit is a slow process.

Jay Miller
Sr. Oracle DBA
201.369.8355

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Neil Chandler Sent: Tuesday, March 26, 2019 10:35 AM
To: Chris Taylor
Cc: gogala.mladen_at_gmail.com; ORACLE-L Subject: Re: System stats

In the majority of places I have worked - 5 clients last year - the SANs were overloaded in 4 of them. They are too frequently sized for capacity and not throughput/response time. The response time was inevitably variable and System Stats would not have been helpful on the systems they have. In one of the clients, some of the critical DB's have dedicated storage but changing the system stats would have had little to no effect on those systems due to other measures having been put in place (including using a low optimizer index cost adj on one system, meaning lots of index use. Just not necessarily the right indexes.)

The optimizer tries to be all things to all people, and there's lots of parameters to try to twist it into the shape that you want. The problem is frequently the abuse of those parameters - especially the global ones - via googling a problem, believing a silver bullet blog, and the lack of time to prove the solution so we just throw the fix into the system. It can be enlightening to strip the more extreme parameters back to their defaults and see how the system copes.

As an aside, did you run your systems with the default parameters, discover notable problems and then use the 2 sets of system stats to correct those problems, or did you put them in from the start and everything was good?

There's a case to be made for using system stats, but I just don't think that is something that should be used frequently.

Neil.



From: Chris Taylor <christopherdtaylor1994_at_gmail.com> Sent: 26 March 2019 12:59
To: Neil Chandler
Cc: gogala.mladen_at_gmail.com; ORACLE-L Subject: Re: System stats

As far as the workload, I used 2 workload stats and swapped between them - one for the day where the business hours and the off-business hours had their own personalities (for lack of a better word).

As far as the SAN goes, if enough systems are hitting the SAN enough to cause the IO rate/throughput to become affected, then its *probably* time for a new SAN.

Chris

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Apr 12 2019 - 16:23:57 CEST

Original text of this message