Re: Oracle - solving performance problem

From: Jim Kennedy <kennedy-downwithspammersfamily_at_attbi.net>
Date: Fri, 7 Oct 2005 17:52:50 -0700
Message-ID: <ZNqdnV84m-iShdreRVn-gg@comcast.com>

<mark_sensfan_at_hotmail.com> wrote in message news:1128721372.461542.213190_at_g44g2000cwa.googlegroups.com...
> I am a sysadmin tasked with developing a solution to improve our Oracle
> 10g performance. I inherited the following server and problem. I'm not
> a DBA, so if my terminology is off please forgive me, if you are kind
> enough to help but need more info, I can get you that info.
>
> Server Spec:
> Red Hat Enterprise Server 3.0 2.4.21-27.0.4.ELsmp
> Dual Intel Xeon CPU @ 3.00GHz w/ hyper threading
> 12 gigs of RAM
> 3 x RAID 5 arrays with 7 disks each, attached externally. Each array
> has its own SCSI channel.
>
> Problem:
> We have a 230 gig database that contains 101 million rows. Indexes are
> placed on mount point/SCSI channel, data on another mount point/SCSI
> channel. I'm told that the size of this database can't be changed by
> our development staff. In fact it will be growing by about 20-30% each
> year. Certain queries are taking a dogs age to run. One is a mass load
> of data each day, the other a more critical one is a client query that
> takes about 2 minutes. We want to bring this value down to something
> like 5-10 seconds, if thats not possible at least 30-45 seconds or so.
> We know our bottle necks are I/O in both reads and writes.
>
> Things we have tried:
> - Enable and use Async I/O
> - Increase SGA to 2.7, jump to 3.7 not done yet.
> - Upgrade SCSI driver, saw some minor improvement
> - Other minor DB/query changes
>
>
> Consultant was hired to look at our environment they recommended:
> - We need 5-10% of db size in SGA, therefore 23 gigs of SGA for today's
> size.
> - 64 bit architecture, to allow for huge SGA
> - Scrap the RAID 5 arrays and build RAID 10 arrays instead, more
> platters the better.
> - Recommended some query and index changes, the fixes will help, but
> won't be overly significant.
>
> I need to figure out what we can do to achieve our goals and have it
> last at least 2 years of increasing data. I've been researching the
> following.
>
> - Change hardware platform to a large Sun box that can take more than
> 32 gigs
> of RAM running Solaris. (v890 or something)
> - Some type of high performance platform from IBM with AIX
> - Intel/Sun Oracle grid with several nodes, blade server perhaps
> - Scrap SCSI and use a fiber channel SAN instead, in standalone or Grid
> setup.
>
> Anyone have any opinions on the best course of action? How should one
> handle this amount of data and maintain performance?
>
> Grid seems like a good option, but perhaps not necessary because I/O is
> our problem, not really CPU. Dumb question, but how do multiple grid
> members access the same data efficiently? Can the SGA be spread around
> the members of the Grid .Any caveats?
>
> Thanks for any advice you may be able to provide.
>
> Mark
>

You need to find out what the bottleneck is. What is taking the longest? Can you eliminate that?
Look, you are wasting money on the consultant and the hardware until you know what is taking the longest and of the whole work load what that represents. Let me tell you a story to illustrate my point.

Once upon a time I worked on designing pension administration systems. We created these systems in a scripting language. The language was interpretive, not compiled. Each system was customized for the client. It was not uncommon for a pension calculation to take 2 minutes. Of course, the conventional wisdom said that if you just wrote the entire system in a compiled language (eg C) then it would be much faster. After all we know that compiled languages are faster than interpreted languages. Certainly this is true if all other things are equal. Fortunately it would have been a monumental task to write the whole system in C. The scripting language had a very nice profiler. So I would go on site and profile the calculation.

Once I profiled the application, it would turn out that the majority of time was spent getting data. Out of 2 minutes maybe 1 minute and 50 seconds was spent getting the data. So if we wrote the application in assembly language and fine tuned the code the best we could do was to go from 2 minutes to 1 minute 50 seconds. So the answer wasn't to write the code in C or assembly language. The answer was to NOT spend so much time getting data. It was quite common to calculate the final average pay by taking the highest 60 months out of the last 120 months. It was quite common that people who wrote the application would retrieve the pay data 120 times to do the pay calculation. Ahhh, what would happen if we retrieved the pay only once instead of 120 times.

It would take me about an hour to make that change in their system and now the calculation time would take maybe 10 seconds. (from 120 seconds) Doing a little more work, looking at what was not the top time waster would in general net a 4 to 7 second reduction in calculation time. So we went from 120 seconds to 6 seconds a factor of 20! All with about 4 hours of focused work.

We didn't have to buy faster hardware. We actually could use slower hardware since we were not doing things that we didn't need to do.

The point is you need to find out what is event in the whole thing is taking the longest. If you reduce that to 0 what would the results be? Is that fast enough? Can you avoid doing it at all? Can you do less of that?

But you have to tackle the most expensive operation first. You might need more hardware, but until you identify it you are just lining the pockets of someone else.

Jim Received on Fri Oct 07 2005 - 19:52:50 CDT