Re: RAMSAN Experience
Date: Fri, 11 Sep 2009 17:58:38 -0700
Given this host has 2 x quad core CPUs, the default DOP would be 16. Parallel "default" (2 * cpu_count) is generally able to leverage the CPUs to their fullest with one query, given modest filtering and joins. I would comment that 30GB for your db_cache is probably a waste with PX. The "Buffer Pool Advisory" section of your AWR report would likely confirm this. I would generally say to add more to your pga_aggregate_target but I dont think that you can leverage much more than what you have with only 16 or 32 PX servers. Thus my comment would be that this host is not well balanced for a PX workload; it has too much memory and not enough CPU (usually 4GB per cpu core is good). As a comparison point, the nodes for the DB Machine are G5 DL360s with 2 x quad core CPUs with 32GB RAM and we generally use pga_aggregate_target=16GB and sga_target=8GB.
On the storage side 2 x 4gbps ports will do 2 x ~400MB/s = 800MB/s max. Can your storage deliver this to the host? You can use Oracle Orion or try a simple count(*) at the default DOP and use vmstat/iostat or similar to validate this. It is very important to know what the physical capabilities of your hardware are.
I hate to sound like a school master, but messing with underscore parameters is really a waste of time, even under constraints. Root cause analysis is really required.
I would list your triage tasks as such:
1) Validate the execution plan for your query or queries. Are the cardinality estimates accurate and is the join order optimal? Trouble shooting excess temp writes for suboptimal execution plans due to non-representative stats is simply a waste of time when you may be able to eliminate them all together with an optimal execution plan. 2) Understand how much I/O scan capacity your system has. Do some micro benchmarks with Orion or a count(*) FTS. This will give you a sense of how fast your table scans should take and how many of them can run simultaneously before you exhaust your I/O bandwidth. 3) Understand the "tipping point" for when your HJ spills to temp for a given DOP, given an optimal plan. Is it 1M rows? 10M rows? 100M rows?,etc. At each of those, how much PGA memory is being used? I would suspect you can fit quite a number of keys for a HJ in 8GB or 16GB of RAM.
A systematic approach will put you light years ahead of mucking with parameters, especially hidden/underscore parameters.
On Fri, Sep 11, 2009 at 8:50 AM, Rob Dempsey <Rob.Dempsey_at_5one.co.uk> wrote:
> Thanks for all the feedback from everyone I will try and answer all the points in this mail.
> Oracle Version 10.2.0.1 going to 10.2.0.4
> 2 * IntelŽ XeonŽ E7330 processor (2.4 GHz, 2x3M cache, 80 Watts)
> 1 card, dual port 4gbit connection
> Concurrent user 5-10 at the moment
> We use parallel query on the table with a value of 8 /*+ parallel(t,8) */. I realise that parallel query will make use of 'direct path reads' to read the data into the process's PGA area however I have found that if we set the small table threshold underscore parameter to a high value the query will use the data in the cache instead. I am making an assumption that this parameter may work on a segment level and yes I do realise that there will be a high usage of latches (I have already used Tom keys run_stats package)
> Regarding the queries, the 26G table is roughly around 900 million rows a number of which will self-join the table so that the query maybe be a 100 million to 100 million join. This is why certain pga steps are spilling to TEMP. I know about tuning using underscore parameters is not the best course of action however the reason I used the parallel clause in the first place was to open more pga memory to the SQL queries. However an 'option' was to look to add more to try and push the 'global memory bound' beyond 1G (a whole research project in itself)
> From my original email I did say we had resource problems and constraints. Indeed I am an Oracle Developer / DBA. Storage is not my core strength and we do not have much experience in the department. This is the root problem but you have to work with what you got.
-- Regards, Greg Rahn http://structureddata.org -- http://www.freelists.org/webpage/oracle-lReceived on Fri Sep 11 2009 - 19:58:38 CDT