From: Marquez, Chris <>
Date: Thu, 8 Dec 2005 16:19:58 -0500
The resolution...and this is no joke...directly from Dell...I was on the conference call;

Dell (and I assume its hardware partners) could not resolve our/their hardware problem,
nor will they further try to resolve the problem and we should begin working with our
Dell sales rep. to negotiate some form of hardware exchange/rebate.

Yes, I was speechless too.
We had been running on hardware (in a config) that had been tormenting us for a year and was never going to work! Oh it gets better or more crazy...depending how you look at it.

Turns out (I asked directly) that this config;

Remember in my original post I indicated that we did not see the same issues on the same hardware when we ran *WITHOUT* RAID implemented on the PERC controler cards nor with the PowerVolt in "cluster mode"...although I believe the RAID and PERC cards to be the problem.

So reading between the lines what I gather from all of this is either; Do NOT use RAID (and Oracle) with this config _or_ run low IO apps on this config.
What I got out of this painful ordeal is that this hardware works just fine as long as you do not "push" it hard!  :o|

This one will stick with me for a while.  

Chris Marquez
Oracle DBA

PS It has taken me so long to post because I have to rebuilt our entire 2 db servers and 2 backup db servers off of this config. We were 100% RAID and now we have no RAID at all...but very solid backup and failover operations.
Nothing like moving, or rather "juggling" you production servers on a days notice.

PPS I should mention that this is for another client...not from where I send this email.
I would not want to misrepresent somebody else's sever environment.   

We run Dell 2650's with Redhat Linux for our OAS systems. These boxes work very well for commodity boxes but the storage systems are fairly basic with internal drives in a mirrored configuration. We've had many problems with two Dell SANs that we use for non-Oracle applications (email; central file storage). Dell has been quick to get in to replace failed hardware, but we've had problems with the automated support stuff not working and Dell has been just plain terrible helping troubleshoot performance problems with one of our SANs. I'm about to buy new database servers with shared storage and Dell doesn't make the cut. You will pay (a little/lots) more for HP, IBM, or Sun hardware, but Dell's hardware and support doesn't compete as far as I'm concerned.


On 11/17/05, Thomas Day <> wrote: > Just about to replace a hard drive on our PowerVault just as soon as > the Dell tech gets here. Doesn't sound like a very reliable machine.

> I'm not sure if it's the hard drive or the controller.  But the price 
> is right and the gumment loves them.  Our web farm people report that 
> they have to replace a hard drive once a month on the average (30 web
> In my case I have to play with cards I'm delt but the only other time 
> that I've had hard drives fail was 10 years ago when I had to work 
> with some NCR pieces of junk.  There, with just one machine, the 
> controller would report a hard drive bad about once a week.  Nothing 
> really wrong with the drive, just the controller had decided to make
life interesting.
> The tech would put in the new drive, test the "bad" drive, and return 
> it to storage.  I really got to practice my recovery techniques.
> Looks like the good old days are back again.

A week ago last Monday, a single failed hard drive in a hardware RAID 10 configuration took down a server running Oracle 10g R1 on a Dell PE 2650. Yes, that RAID volume supported the OS mount points but isn't thepoint of (hardware) RAID 10 to handle the fault and not propagate the failure to the OS?

A replacement of the failed drive and a system restart kicked in the auto-rebuild of the volume, but still I was unhappy that the unit didn't take the hit and kept ticking. I haven't had a window yet to upgrade the firmware and drivers and see if that alleviates the problem.

Months ago, a refurbished Dell PE 2800 repeatedly threw errors when running a large import job. The internal RAID 10 vols would simply go offline. Replacing the PERC (poweredge raid controller) resolved the issue.

At a client site, a Dell PE 6350 under load would occassionally lose all connectivity with its pair of direct attached SCSI RAID PV 220S units, across a pair of perc cards. Fortunately is was only the test system and not production and a system restart would remount the volumes.

Across several installations, Dell + EMC Clarion units have been stable and solid.


#/etc/init.d/init.cssd stop
-- play a Sony CD, install a rootkit today

Hi Janine.

We have a Dell PE 2800 that threw errors during testing ... importing data into a new database. After multiple go-rounds with Tech Support ... I was lucky enough to get ahold of a competent Tech Support engineer in the server support group. He authorized a replacement of the RAID controller and I haven't had a storage issue on that box since then. That server is running w2k3 svr 32 bit, but had run RHEL 3 update 5 in testing.
That box has 10 drives, 2 PERC cards as the removable drive bay was populated.
When it would hit an error, the 8 internal drives not in the drive bay would go bye-bye.

The same box threw memory errors.
We've gone thru 3 separate iterations of attempting to replace the failed module.
As it was in pairs ... they've sent the wrong parts, sent one module (unpaired).
We're still awaiting replacement parts and have been limping along on only 2 GB in that box.

Their support is spotty - some great techs, some bad - kind of like Oracle or any other company.


On Nov 16, 2005, at 11:52 AM, mkb wrote:

> Hehehehe...I just finished a call with Dell support - memory issue for

> probably the 3rd time this year.

We are a much smaller shop than the rest of you and we run dinky little servers by comparison, but even I have a Dell horror story. We had one server we bought a few years ago, I think it was a 2550 but I'm not sure about that, that had hardware problems from day one. I was trying to load a multi-GB Oracle export and the system kept restarting itself halfway through. Dell very reluctantly sent replacement parts several times but we were never able to get it fully working. It has exhibited a multitude of symptoms over the years.

My sys admin is both busy and lazy and he didn't follow up very well with Dell, plus they moved as slowly as humanly possible, with the result that the machine finally went out of warranty and still was not working right. We have never been able to use it in production.

The sys admin finally figured out what the root cause is some time ago; he read somewhere that this particular hardware has disk controller problems when you have two CPUs in the box and are running Linux. It would probably work fine if we could take one of the CPUs out, but you can no longer buy the appropriate blanks from Dell. They know about the problem, but have never managed to fix it (not that they are admitting to, anyway). I should say for the record that this info is third-hand or worse and should not be relied upon as the gospel truth, even though I have no reason to doubt it based on our experience.

We have continued to buy servers from Dell (holding our noses each time) because they have been the most cost-effective choice, even with the hassle factor. But I have been reading lately that they are telling analysts they are going to bump up their profit margins and do less discounting. I almost hope that happens, just so I have an excuse to buy from someone else!


Hehehehe...I just finished a call with Dell support - memory issue for probably the 3rd time this year.

We're running 6650s on RHAS 3.0 with EMC/Dell/Clariion CX-200 as our db storage. Don't know what version of PERC or RAID s/w but we do seem to have quiet a bit of hardware errors on our servers.

Ususally memory and disk issues. Our SAN is pretty stable with no outages there. But yeah, the server HW seems a bit prone to failures.



We're running RHEL 3 ES update 5 with on a single Dell PE 2650 with a single PV220S unit (split backplane). I haven't run clustered anything.

Other than one drive failure, we've had no issues.

This unit did run on RHEL 3 ES (was probably update 2 at the time) without issue.

Let me know if there is anything in particular you're looking for.


Dell 2650
PERC 4/DC (Dual Channel) RAID Controller for *external* storage (on 2 servers)
PERC 3/DC (Dual Channel) RAID Controller for *external* storage (on 2 other servers)
PowerVault 220

Oracle EE 9205
Oracle Cluster Manager 9205 (oracm, version[ ]) Oracle OCFS-Oracle Cluster FileSystem 1.0.13-PROD1 (on 2 servers) EXT3 (on 2 other servers)

Red Hat Enterprise Linux ES release 3 (Taroon) kernel 2.4.21-15 (resently upgraded at Dell request. Linux SCSI MegaRAID Driver, Version - Release Date: 10/25/2004 - Products Supported: MegaRAID Controllers


For the 4th time in 12 months our hardware has let us down and we are running on the a backup db server.
Disk errors controller (module?) failure.

We have tick with Dell open. Today Dell Support tells us that this config is *now* not supported for Oracle (RAC?)!!!

My SA tells me that on some Dell forum he sees lots of pleading for help from those running Dell-PowerVault and PCI PERC RAID Controller for *external* storage and MegaRAID Driver. He says most please go unanswered.

We run this is a RAID 1 config and previous ran it in a RAID 5 config...Dell is telling us that only RAID 10 works for the hardware (for Oracle)!?

He is the really sad part.
One of our 2 Dell PowerVault 220's we bought over two years ago with PERC 3/DC Controller for *external* storage. We ran this for Oracle 817 on SuSE 7.3 (desktop version, pro?...not server) with the out of the box MegaRAID Driver from SuSE. Also, we ran "naked" raid at all. And guess what, not a single Disk, Controller, Driver failure I can remember.

Now that *same* hardware in the config described above; 9i-RHEL3-ETX3-MegaRAID Driver has failed us over and over. Between the two like hardware systems (one RAC, on NON-RAC) we have a probably 6 total failovers and many, many short crash outages.

Seems to me that this software and RAID just doesn't work.

Anyone have experience with this hardware?


Chris Marquez
Oracle DBA

