Re: RAMSAN Experience

From: Christo Kutrovsky <>
Date: Wed, 16 Sep 2009 14:18:08 -0400
Message-ID: <>

Hi Rob,

We had very same problem in 9i, when I first did the PGA memory research. We are now running in 11g, with same tuning settings. That particular database is about 1 TB.

I didn't see what platform you are using, these guidelines are tested in both Solaris and Linux.

You have to use underscore parameters to increase global bound beyond 1 GB. Even on 64 bit Oracle, it doesn't go beyond 4 GB (3.5GB actually) but you can work around this with parallel query. Haven't tested in 11gR2.

Note that even with parallel, sometimes data is not split evenly in all processes. v$pq_tqstat can show you this, but only in same process that runs the query. For example, run this query after your parallel query has ran, in the same session:

select dfo_number "d", tq_id as "t", server_type, num_rows,rpad('x',round(num_rows*10/nullif(max(num_rows) over (partition by dfo_number, tq_id, server_type),0)),'x') as "pr", round(bytes/1024/1024) mb, process, instance i,round(ratio_to_report (num_rows) over (partition by dfo_number, tq_id, server_type)*100) as "%", open_time, avg_latency, waits,
timeouts,round(bytes/nullif(num_rows,0)) as "b/r" from v$pq_tqstat order by dfo_number, tq_id, server_type desc, process;

Slides of my presentation (old) are here:, there's a whitepaper too at IOUG's web site.

Short story (assuming 64 bit OS and Oracle):

Increase limit to 4 GB (8 GB per process). Note, first parameter is in bytes, second in KiB.

Even when it does spill to disk, increase IO sizes so that it's more efficient: *._smm_auto_max_io_size=2048

The IO setting can increase a spilled sort/hash join by a factor of 2-3, but is very SAN Cache/Usage specific.

It still puzzles me why Oracle has limited global bound to 1GB...

If you don't want to play around with underscore settings, you can always create a "ramdisk" (tmpfs or similar) and put your tempfiles there. That approach however will have a CPU hit, and you will have to recreate your tempfiles every reboot.

Christo Kutrovsky
Senior Consultant
The Pythian Group

On Fri, Sep 11, 2009 at 11:50 AM, Rob Dempsey <> wrote:
> Hi
> Thanks for all the feedback from everyone I will try and answer all the points in this mail.
>        Oracle Version going to
>        db_cache_size=30720M
>        pga_aggregate_target=28672M
>        DL580G5
>        2 * Intel® Xeon® E7330 processor (2.4 GHz, 2x3M cache, 80 Watts)
>        1 card, dual port 4gbit connection
>        64G
> Concurrent user 5-10 at the moment
> We use parallel query on the table with a value of 8 /*+ parallel(t,8) */. I realise that parallel query will make use of 'direct path reads' to read the data into the process's PGA area however I have found that if we set the small table threshold underscore parameter to a high value the query will use the data in the cache instead. I am making an assumption that this parameter may work on a segment level and yes I do realise that there will be a high usage of latches (I have already used Tom keys run_stats package)
> Regarding the queries, the 26G table is roughly around 900 million rows a number of which will self-join the table so that the query maybe be a 100 million to 100 million join.  This is why certain pga steps are spilling to TEMP.  I know about tuning using underscore parameters is not the best course of action however the reason I used the parallel clause in the first place was to open more pga memory to the SQL queries.  However an 'option' was to look to add more to try and push the 'global memory bound' beyond 1G (a whole research project in itself)
> From my original email I did say we had resource problems and constraints.  Indeed I am an Oracle Developer / DBA.  Storage is not my core strength and we do not have much experience in the department.  This is the root problem but you have to work with what you got.
> Thanks
> Rob
> -----Original Message-----
> From: Greg Rahn []
> Sent: 09 September 2009 21:55
> To: Rob Dempsey
> Cc:
> Subject: Re: RAMSAN Experience
> First, I will award you negative DBA tuning points for even suggesting
> to use underscore parameters as tuning "knobs".  This is the wrong
> approach.  And for the others that do the same, take heed (yes, I am
> on a soap box now).
> Let's first understand some things about PX:  IIRC if the table is
> smaller than the small table threshold, it can be put into the db
> cache and read serially, but most tables in a DW do not fit this
> requirement so then will be physically read off of disk each and every
> time a PX scan is done.  This means the system needs to support a
> fairly high number of MB/s (GB/s) of disk throughput, especially if
> there is either a large number of concurrent queries or a high (for
> the system cpu count) DOP or both.  This changes some with 11gR2 and
> the in-memory PX option but lets skip that for now.
> With a 26GB and a 3GB table we're really not talking very big amounts
> of data here, so what is the physical memory size and how are you
> using it?
> How many and what kind of CPUs?
> What is your storage connectivity (#HBAs and speed) and number of
> spindles the db is on? Dedicated or shared storage?
> How much disk bandwidth (GB/s) can you drive from this host to the storage?
> How many concurrent queries and at what DOP is the workload?
> If you plan is to add SSD for temp space, my recommendation would be
> not to bother.  What you need to investigate is why the HJ is spilling
> to disk and see if you can prevent it.  First, I would check the
> execution plans and validate the cardinality of the row sources.  Then
> you may need to add extra memory to pga_aggregate_target or to
> increase the DOP.  By increasing the DOP you will give the query more
> PGA memory to address as it is partly limited by the number of PX
> slaves; so 8 slaves can address 2x the memory that 4 slaves can, given
> its availability.   It also could be a data skew issue or a join skew
> issue.
> If you plan to add SSD for redo logs in a DW I would ask: Why are you
> not loading it via parallel direct path and avoiding redo?  Why write
> redo to expensive SSD when you can avoid writing it at all?
> So in the end I don't think SSD will give you anything but a hole in
> your company's pocketbook.
> On Wed, Sep 9, 2009 at 8:29 AM, Rob Dempsey<> wrote:
>> I guess I should try and explain the problem a little bit more. This is a simplified version (it will make a much longer email to detail all our constraints and resources). We have a read only reporting application that allows users to query raw row level data.  There are a number of combinations a query can have be it date period, type of products etc etc which makes it near on impossible for us to summaries the data - trust me I would if I could. For our large system the two main tables are in size
>> A       25690.25M
>> B      2955.25M
>> We use Oracle compression, pctfree 0, parallel query and partitioning on date.  As our users seem to be addicted to response time being as low as possible and not having enough expertise in storage one solution was to set a db_cache_size that could accommodate all the table data and throw it into memory. Simply a large in memory database. This solution has worked very well for our smaller database, however as the data got larger hash joins, group bys are spilling to disk. PGA is set to a large value and my next point of call is to test different value for the underscore parameters that control it.
>> We use EMC storage however the latest idea is to use RAMSAM for the temporary files. I always thought it might be a good idea for the redo logs but I am not sure about the TEMP files.
>> Like I said we have a number of constraints, but any help would be welcome.
> --
> Regards,
> Greg Rahn
> --

Christo Kutrovsky
Senior DBA
The Pythian Group -
I blog at
Received on Wed Sep 16 2009 - 13:18:08 CDT

Original text of this message