Re: issues explaining performance issues to clients
Date: Thu, 23 Jul 2009 11:21:47 -0400
this is an oltp where the dbas have very little control over the application or data model. Bringing up parallel query is just a bad idea in this situation.
Partitioning will not help. If there are 30 columns in a table and users can at will select any of those columns in any combination in a query than we have 30! or 30x29x28x ... number of possible combinations of columns.
The issue I am having is communicating this to them since this application has grown over 15 years and they have always done this. However, now there is alot more traffic and performance is not scaling.
On Thu, Jul 23, 2009 at 6:11 AM, Wilhelm Klotz <willyk_at_kbi-gmbh.de> wrote:
> nearly had the same problem 2 years ago - also in a government
> environment .. :-)
> we did several things to increase performance:
> a. talk to the users, and explained when indexes are used, and when they
> are NOT used
> b. we partitoned tables and indexes, to take advantage of parallel
> c. we use resource manager to limit long-running queries.
> Willy Klotz
> *Von:* oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org]
> *Im Auftrag von *Dba DBA
> *Gesendet:* Mittwoch, 22. Juli 2009 20:42
> *An:* oracle-l_at_freelists.org
> *Betreff:* issues explaining performance issues to clients
> How do you handle this? This is a large government project environment.
> Users can basically do whatever they want here. They have the ability to
> make any query they want. We have one table that is about 2 GBs that has 30
> indexes on it to support this.
> Well the users created yet another combination of where clauses that we do
> not currently have an index to support(this happens alot). It does 200,000
> LIOs and 200,000 PIOs. In our test environment where there is little
> activity this returns in 6 seconds. Through a 10046 trace my physical waits
> are negligible (the data is probably cached in the SAN, since the LIOs and
> PIOs are the same) and the waits on db_file_scattered_reads total to less
> than 1 second.
> In production. The exact same query, same plan and basically the same LIOs
> and PIOs takes about a minute. Of that 55 seconds of it is physical reads.
> We have twice as many db_file_scattered_read waits and the wait time is
> about 55 seconds. This does not surprise me since due to alot of bad queries
> like this overall workload is higher and this leads to more waits and slower
> response time. This is fairly generic.
> A few questions.
> 1. Some people here see this as a "problem" that requires root cause
> analysis to fix. They want to make production "fast". From a DBA perspective
> the root cause is "fix the queries and use an existing index. Stop letting
> the users do whatever they want". Or the more expensive route, we need to
> consider another reporting system, etc... and a major architecture
> change(they won't like the second one at all).
> 2. They want me to engage other people to find the root cause. The SAN is
> run by a totally different team that works with many teams. Does anyone know
> what tools they should be using to look into this and what I should say to
> them exactly, to get more detailed metrics beyond oracle's perspective.
> I do not know how the underlying subsystem is set up. For all I know we
> could have other applications in the same RAID groups, etc...
> I have tried to work with SAN guys before. Unless you get a consultant from
> EMC or HP or something like that, I find that they just click buttons. I do
> not know if that is the case here. So how would you engage the SAN