From: Barr, Stephen <>
Date: Wed, 15 Dec 2004 20:34:40 -0000
Hi Amir,

        We also have a DMX 3000 box and have it striped 8 ways.

        We have 83 meta devices, each meta device is ~67Gb is size and is made of eight 8.43Gb volumes. Each volume is RAID 1, however, each meta volume is striped across it's eight individual volumes with a stripe size of 0.94Mb.

        The issue we have at present is that we are a datawarehouse doing lots of 1Mb direct path reads. Each read will hit 8 physical devices (with 1Mb stripe unit size at OS). I assuming this is a bad thing - surely each of our reads should be hitting only a single device? i.e. we're waiting on 8 devices instead of only one.

        I've performed a number of tests with PQ on the current setup, and it looks like the IO subsystem is saturated with a single PQ query (degree 4) to such an extent that two PQ queries running together BOTH take twice as long to complete....surely this isn't the pattern we should be seeing? It essentially means that the system is 100% non-scalable.

1 query PARALLEL 4 (FTS)

1Mb Stripe unit		5 mins 18 secs
512k Stripe unit		5 mins 18 secs
128k Stripe unit		5 mins 52 secs
CONCAT			5 mins 10 secs

2 queries hitting same table PARALLEL 4 (FTS)

1Mb Stripe unit		8 mins 43 secs (each)
512k Stripe unit		10 mins 12 secs (each)
128k Stripe unit		8 mins 35 secs (each)
CONCAT			8 mins 10 secs (each)

Does anyone have any experience of setting up this type of storage solution for a data warehouse?



From: [] On Behalf Of Hameed, Amir
Sent: 15 December 2004 19:31
Subject: RE: Storage array advice anyone?

While the discussion is going on these heavy duty SAN boxes, I would also like to bounce a question on the disks layout in SAN. We have recently acquired an EMC DMX 3000 box. Our current production is running on EMC 8830, four-way striped, and is going out of lease in a few months. So, we will be migrating our mission critical production system to the newly arrived DMX 3000 box soon. I have gone through a white paper from James Morle, "Sane SAN", which basically suggests that for optimal SAN disk layout, assume that there is no cache available and stripe disks optimally and consider cache as a added benefit.

In our existing configuration on the 8830 frame, the Meta Volumes is created from four hypers and is 20GB in size. The Metas are then presented as a volume to the server and each mount point is based on a 20GB volume. We are not double-striping the volume at the host level. The drives in the 8830 frame are 73 GB in size and do an average of ~ 120 reads/seconds and ~ 110 writes/seconds. So, the I/O bandwidth of a Meta would be ~ 480 r/s (4x120) and ~ 440 w/s (4x110).

Having said that, I have done some basic calculations on the IOs that Oracle is issuing (on the 8830 frame) from the v$filestat and v$tempstat views, aggregated on per mount point basis. From what I have seen is that on some mount points Oracle is doing up to 800 reads per second. Based upon the fact that on a highly available system, it is not always possible to move around hot files without incurring a downtime, I am exploring the idea of striping the new DMX frame 8-ways. This DMX frame has 146 GB drives and based upon these drives specifications, they can do ~ 130 r/s and ~ 120 w/s. So, an 8-way striped Meta volume would be able to do 1040 r/s and 960 w/s. I was in the HotSos symposium this Summer and I asked Steve Adams this question and he also suggested going with 8-way striping. Is there anyone in this DL who is using a DMX frame and striped 8-ways ?

Does anyone has any advise on 4-way versus 8-way striping ? EDS is our service provider and they are not buying the idea of 8-way striping as they and EMC think that the cache on frame can resolve all the issues, which is not true because the cache has to de-stage at some point and I have seen high IO waits on the 8830 frame from sar and I don't believe that cache is nirvana.

Thank you

[] On Behalf Of Stephen Lee Sent: Wednesday, December 15, 2004 10:33 AM To:
Subject: RE: Storage array advice anyone?

I appreciate the discussion on the topic. I think additional considerations on this particular array (Hitachi TagmaStore 9990) are that the "normal" configuration (according to Hitachi) is that the disks are in groups of 8; each group is a stripe with parity; the parity cycles around all drives. When a bad block occurs, the block is NOT replaced by a spare block on the drive, but the drive is failed and replaced by a hot spare, and phone home occurs. Which -- I guess -- is a fairly aggressive drive replacement scheme.

There appears to be agreement that the best performance for most cases (note: most cases) is to stripe everything across all drives. There does appear to be some remaining discussion, from a fault tolerance standpoint, about whether to go strictly with stripe + parity and trust that Hitachi really has worked out the fault tolerance issues, or assume that claims from Hitachi are just a bunch of sales hype and insist on stripe + mirror. Healthy skepticism is useful, but one does not want to be basing that skepticism on outdated ideas. That is what a lot of this comes down to: Which ideas and rules are outdated -- given the capabilities of this new gee whiz hardware -- and which still hold.

The astute reader will note that the stripe + parity is, more or less, raid 5-ish. But yet again, we have a manufacturer who claims that in their case the I/O speed penalty is no longer an issue. In the case of this array, there appears to be some real world experience to support that claim. Any comments from those who know otherwise, are most welcome. Again, another one of those "Have some of the ideas about this become outdated?" sort of thing.




