Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> 10g RAC: max performance & min cost with miSCSI?

10g RAC: max performance & min cost with miSCSI?

From: Heikki Siltala <abcwebmasterxyz_at_abcheikkisiltalaxyz.abccomxyz>
Date: Wed, 16 Nov 2005 11:07:02 +0200
Message-ID: <dlesrb$3ro$1@phys-news1.kolumbus.fi>

Hello all,

we are currently choosing which path to take in our Oracle database environment renewal process. What we want is 1) reliability 2) minimun downtime 3) maximum peformance 4) minimum cost. Sounds easy? :-) Due to reliability and minimun downtime requirements we are planning to build a 2 node Linux Oracle 10g RAC. To get maximum performance we have to focus on the disk system peformance. To minimize the cost we are planning to run the nodes 1 CPU each since Oracle lisences are per CPU. 2 CPU per node would require us for purchasing Enterprise Edition licenses, RAC licenses, OLAP and partitioning licenses etc. 1 CPU per node requires only a RAC license for 2 CPU since we have already everything else for 2 CPUs. A detailed calculation showed that 1 CPU per node cost in our case only 25 percent in terms of Oracle licensing costs compared to 2 CPU per node alternative.

I have started the desing from the IO peformance. I have never seen an Oracle database environment where CPU is the bottleneck. The bottleneck seems almost always be on the disk system so starting the design on disk system seems the right way to go. I have taken a goal to build a system that offers enough IO bandwidth to saturate the CPUs (1 CPU per node, 2 nodes). Some materials suggest that a modern CPU can drive 200 MB/s (3 GHz Xeon). Since we have 2 CPUs on RAC the disk system should be able to deliver 400 MB/s, let's say 500 MB/s to be safe. Assuming that the physical disks is typically the weakest point on IO performance I started to build it up from there. Using SAME (stripe and mirror everything) the IO load on disk will be random, not sequental. I assume a 15k disk can deliver 25 MB/s of random IOs, so 500 MB/s system requires at least 20 disks.

I have browsed thru the storage solutions of different vendors (Sun, Dell/EMC, Adaptec, HP, IBM). To limit the possiblites I focused first on HP's offerings. What we need is disk system that can hold 20 disks (and maybe some additional 300 GB 10k disk for disk-to-disk-to-tape backups etc) and deliver 500 MB/s. HP's MSA500 has not enough disk slots and MSA1000 can deliver only 200 MB/s. If we shift to more advanced storage systems, the price tag rises significantly. But wait, there is still an alternative for 2 node RAC. HP MSA30 MI is a multi-initiator U320 SCSI array with dual SCSI busses and ability to hold 14 drives (7 drive slots per bus). The list price seems to be about 4000 euros. The issue with MSA30 MI is that it doesn't support Xeon (HP Proliant) servers, only Itanium (HP Integrity) servers. I can't understand why on earth they have come up with this limitation! If using MSA20 MI and two Itanium 1 CPU nodes, the disk configuration would be two MSA30 MI arrays, two busses each, so both nodes would need U320 SCSI HBAs for four SCSI channels. The disks could be put so that each MSA30 bus holds five 15k disks.

Now if we start to calculate the performance from the ground up, we have 5 disk per array bus each disk having 25 MB/s random access transfer rate so this makes 125 MB/s per bus. We have four busses on disk arrays, so the disk arrays can deliver 500 MB/s if the IO is evenly distributed. Each array bus is accessed by both nodes using multi-initator U320 SCSI channel, and since the channel can theoretically run 320 MB/s, 125 MB/s can be easily be transported over it. A node has 4 U320 channels to arrays and each channel has 125 MB/s of disk. So one node could theoretically get 500 MB/s of IO out disks (assuming that the HBAs are PCI-X 64 bit) and can easily saturate the CPU. And if both nodes are accessing the disks the IO speed drops to 250 MB/s which is still enough to saturate the CPU. So if this calculation is correct, we need to buy two MSA30 MI units (total 8000 euros), four 2 channel PCI-X U320 SCSI HBAs and 20 15k disks and get a IO performance of 500 MB/s on two node RAC system.

The questions you might now ask is that how to distribute the load evenly on the disks and how to handle striping and mirroring since MSA30 MI is a JBOD array. Now Oracle 10g ASM comes to rescue. If we configure all the disks as one ASM diskgroup using normal redundancy (2 copies kept) the ASM will do it all: fully automatic IO load balancing, striping and mirroring. No need for LVMs, LUNs, disk array RAIDs etc.

The point of posting all this to the newsgroup is that I would like to know what you think about this idea of using multi-initiator SCSI as a RAC shared disks. This is of course 2 node system and cannot be scaled to n nodes, since shared SCSI with MSA30 MI is only for two nodes (and I think that RAC on m-i SCSI has the same limitation). The other open issues/questions still remains are

  1. Can we fully rely on 10g ASM? In this solution the data redundancy is managed only by ASM, not by the storage system. What are the risks if we build a huge 20 disk diskgroup (2 failure groups, 10 disks each) to store ALL the database data (tablespaces, redo logs, archive logs, control files etc).
  2. What would be the other options to get 500 a MB/s disk system performance in a reasonable price? If we put two MSA1000 units, each delivering 200 MB/s, we get 400 MB/s which is quite close. How about the offerings from other vendors?
  3. Can you think any glues on why on earth HP has decided not to support MSA30 MI on Xeon (Proliant) servers. Why it is only for Linux 64bit and HP-UX Itanium (Integrity)? Is there similar mi-SCSI storage systems from other vendors that are supported for Xeon servers? It seems more than a little bit pointless to build a cheap disk system and then be forced to move from Xeon to Itanium.
--
Heikki
Received on Wed Nov 16 2005 - 03:07:02 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US