Re: Replicated File System Consistency

From: Pat <pat.casey_at_service-now.com>
Date: Thu, 1 Jan 2009 09:12:35 -0800 (PST)
Message-ID: <5e3c076f-8b36-4f75-8bb4-9d6cc9704542_at_r15g2000prd.googlegroups.com>



On Dec 30 2008, 9:55 am, DA Morgan <damor..._at_psoug.org> wrote:

> David's link is a good one but I'd like to address the fact that you
> are not seeing any improvement from the NetApp 3040. Here are a couple
> of questions you might explore:
>
> 1. What is the limiting factor? CPU? Network bandwidth/latency? Storage?

  Nature of our app is that the working set fits in memory most of the time, so we're largely CPU bound, and you wouldn't expect a SAN to change that at all. We have certain queries/operations that tend to be IO bound, but we didn't see a dramatic improvement on these (maybe 20% if I recall).

> 2. How many LUNs? (one is never almost never the right answer)

   I think its 70 disks, split into 3 aggregates of 10, 30, 30. Dunno how many luns the san has, but all the DB servers have three luns. Boot LUN is on the 10 disk aggregate, /u01 is on the second aggregate (30), and /u02 is on the third aggregate (30).

> 3. What RAID level?

   RAID-DP which is netapp's version of RAID-6

> 4. How is the cache configured? What percentage read? What percentage write?

   8G cache on each head unit (2 head units), but I don't know how its configured. Frankly didn't even realize you could configure different read/write percentages. I supect the SAN guys know though; is it worth asking? Is there a recommendation as to how the cache should be allocated? Our workload is very read heavy so I'd naively assume a bigger read cache would be preferable.

> 5. How many physical disks are you striped over for your hottest data files?

   30 Fiber Channel 15k drives on each of the main data aggregates.

   All the numbers point to the SAN being much faster, but I think what we basically proved is that we don't have an IO bound workload. To give a little history, we used to have serious problems with IO throughput, so maybe 2 years ago we went through a project to put everything on 64 bit oracle (we used to run 32 bit) with uniform 24G SGAs across all our databases. WIth that much memory and a read heavy workload, our IO problems were largely solved even before we brought the SAN into the picture.

   My suspicion is that if we took the old (IO bound) configuration and moved it on top of the SAN we'd see a tremendous boost in throughput, but the current config is largely CPU bound so the SAN doesn't make a lot of difference.

   FWIW the main reason we went SAN wasn't throughput, but crash recovery. If a DB server explodes, I can mount its 3 LUNS on a hot spare and have it back up in a matter of minutes. Since nobody wanted to pay for dataguard, its the best hot failover strategy we could come up with. Received on Thu Jan 01 2009 - 11:12:35 CST

Original text of this message