Re: ASM vs. dNFS

From: Ruel, Chris <Chris.Ruel_at_lfg.com>
Date: Wed, 24 Feb 2016 20:08:47 +0000
Message-ID: <1AFD62082EEAF0448EF1815139687F134FDBBAE6_at_NC2PWEX504.us.ad.lfg.com>



I want to thank everyone who took the time to respond. It was very insightful information and I learned a few things. While researching this myself, I did find that the dNFS vs. ASM debate is not a religious war like many things. Not too many people seem to have a strong opinion yet one way or another. Don't get me wrong, there are proponents for both sides but no one seems ready to sacrifice their first born for their beliefs...strange behavior for internet debate...I applaud it. It does make it easier for me to see the facts without too much emotion behind them.

One thing I am beginning to wonder is I am not sure it is time for my company to consider dNFS. For one, I have yet to find any really compelling reasons to switch...key word is YET...I am still researching and have just begun testing myself. One big reason to keep our current ASM configuration is because that is what my company is very familiar with and it is used in every database we have. Not sure I need to "fix what isn't broken". Also, moving to dNFS will mostly likely introduce 2-3 years of having both ASM and dNFS as we migrate our systems from one to the other...this would require us to support two configurations.

Let me give some feedback on some of the replies I got...

Mladen: That depends on how do you do ASM. If the drives are iSCSI on a machine without the proper HBA, then dNFS is a clear choice, since it's much easier to administer and will even perform better than iSCSI. For FC connections and iSCSI with the proper HBA, ASM will perform better. Since RAC is ALWAYS about performance, you should choose what performs better. Generating an artificial load similar to your workload by Swingbench or HammerOra should provide a good benchmark.

I will have to go back and review our HBA setup to make sure I understand it and that it is "right". So far, in my testing using Swingbench, I have found throughput to pretty even between both ASM and dNFS. However, I think I need to drive more activity to push the underlying storage to see which one falls off first.

Seth:
You do not need ASM for OCR and voting disks. These can be on a supported cluster file system or over standard NFS.

I did not know this. I think non-ASM OCR/voting was only supported for upgrades of the clusterware from <11.2 to 11.2. I have never tried launching the GI installer without candidate ASM disks ready which the GI installer will then launch into the CRS Disk Group setup screen. Even the documentation says block and raw devices are no longer supported but I guess NFS is not considered a block storage device?

http://docs.oracle.com/cd/E11882_01/install.112/e41961/storage.htm#CWLIN312

If I am interpreting this incorrectly, I am open to a learning moment here!

It is unclear to me why storage snapshots for ASM disk groups required you to use RDMs. Could you not snap multiple VMDKs at the same time? We tried this but had trouble getting it to work. Could be a problem with our set up and not enough knowledge but we have some pretty good NetApp and VMware folks. All that being said, dNFS has lots of benefits over ASM as well and as I assume you were alluding to, is not mutually exclusive to ASM. NFS in general is obviously much more flexible than ASM including the ability to use CloneDB.

Yes! And this is what attracts us to dNFS but we want to make sure we understand what, if anything we are giving up by ditching ASM...especially in terms of performance.

Amir: If you have an IO intensive system, you may want to stick with FC. dNFS has been working fine for us for those systems that do not do a lot of throughput, like SOA databases, etc. However, for heavy-duty ERP systems, even though we have implemented 10gbe end-to-end (from hosts to switches to NAS/heads), we are barely meeting the performance. All of our vendors, including Oracle, storage vendor and network vendor looked at their infrastructure for literally months but no one was able to pinpoint where the bottleneck was coming from. We ended up moving two of our Oracle ERP systems back to FC and will move the remaining ERP systems in the near future.

This is the sort of thing we are afraid of encountering.

Kyle: NFS is the future, has larger bandwidth than FC, market is growing faster than FC, cheaper, easier, more flexible, cloud ready and improving faster than FC. In my benchmarking, FC and NFS, throughput and latency are on par given similar speed NICs and HBAs and properly setup network fabric. Simple issues like having routers on the NFS path can kill performance.

Latency:

NFS has a longer code path than FC and with it comes some extra latency but usually not that much. In my tests one could push 8K over 10GbE in about 200us with NFS where as over FC you can get it around 50us. Now that's 4x slower on NFS but that's without any disk I/O. If disk I/O is 6.00 ms then adding 0.15ms transfer time is lost in the wash. That on top of the issue that FC is often not that tuned so what could be done in 50us ends up taking 100-150us and is alms the same as NFS. I've heard of efforts are being made to shorten NFS code path, but don't have details.

Throughput

NFS is awesome for throughput. It's easy to configure and on things like VMware it is easy to bond multiple NICs. You can even change the config dynamically while the VMs are running. NFS is already has 100GbE NICs and is shooting for 200GbE next year. FC on the other hand has just gotten 32G and doesn't look like that will start to get deployed until next year and even then will be expensive.

Analyzing Performance on NFS

If you are having performance issues on NFS and can't figure out why, one cool thing to do is take tcpdump on the receiver as well as sender side and compare the timings. The problem is either the sender, network or receiver. Once you know which the analysis can be dialed in.

Thanks for the links and the thoughtful insight...again, more of what I am looking for. It does indeed sound like NFS will be a better choice in the future...but, is it enough reason for us to consider switching right now? For one, we just got 10gE...100gE is not even a twinkle in our infrastructure's eye as far as I know. As long as it performs on par with FC and the flexibility and available features with dNFS pan out, that could be reason enough to switch...tough decisions ahead.

Stefan: My clients are using both ASM with FC and dNFS or kNFS for older Oracle releases.

I recently did an I/O benchmark at a client environment (VSphere 6, OEL 6.7 as guest, Oracle 12c, NetApp NFS, 10GE, no Jumbo Frames, W-RSIZE 64k) with SLOB and we reached out close to the max of 1GB/s by an average single block I/O performance of 4 ms (if it was coming from disk it was round about

8-10 ms and the other stuff was coming from storage cache).

I just comment some of your points.

2a) You can do this with ASM or dNFS by RMAN. I highly recommend that you do not rely on storage snapshot / backup mechanism only as you will not notice any physical or logical block corruption until it may be too late. Trust me i have seen more than enough of such cases.

4b) When you are using dNFS in a VMWare environment for Oracle you have no VMDKs for the Oracle files (data,temp,control,redo,arch) at all. You map the NFS share directly into the VM and access it via dNFS inside the VM. You only have VMDKs for the OS (and Oracle software) for example. In addition to scale with dNFS you may not do NIC teaming on VMware level, but rather put each interface into the VM and let dNFS do all the load balancing, etc.

(e.g. ARP).

In sum nowadays there is no reason to demonize NFS for Oracle (with dNFS). It works very well with good performance (FC like).

... i am a kid from the FC decade and i am saying this ;-)

Thanks for your experience and comments. We are not using Snaps for our total backup solution...we still use RMAN as a first priority. Snaps are there if we can use them and for cloning. However, I am glad you reminded me of that fact as I have been considering coming up with a snap only strategy for our larger databases (as long as we can mirror the snaps to a geographically separate site). I see I will have to remember to run RMAN commands (or DBV) to make sure corruption is not an issue. Chris..



Chris Ruel * Oracle Database Administrator * Lincoln Financial Group cruel_at_lfg.com<mailto:cruel_at_lfg.com> * Desk:317.759.2172 * Cell 317.523.8482

Notice of Confidentiality: **This E-mail and any of its attachments may contain Lincoln National Corporation proprietary information, which is privileged, confidential, or subject to copyright belonging to the Lincoln National Corporation family of companies. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout. Thank You.**

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Feb 24 2016 - 21:08:47 CET

Original text of this message