Oracle-L: RE: Q. To RAC or go vertical

From: Matthew Zito <mzito_at_gridapp.com>
Date: Tue, 05 Aug 2003 13:24:23 -0800
Message-ID: <F001.005C90F9.20030805132423@fatcity.com>
*sigh* Alright, I'll bite. See inline.
--
Matthew Zito
GridApp Systems
Email: mzito_at_gridapp.com
Cell: 646-220-3551
Phone: 212-358-8211 x 359
http://www.gridapp.com



> -----Original Message-----

> From: ml-errors_at_fatcity.com [mailto:ml-errors_at_fatcity.com] On 

> Behalf Of Odland, Brad

> Sent: Tuesday, August 05, 2003 3:35 PM

> To: Multiple recipients of list ORACLE-L

> Subject: RE: Q. To RAC or go vertical

> 

> 

> When you do TCO analysis do add in the costs of 

> administration? 


Yes (in fact, we even say that it costs three times as much to administer a
linux RAC cluster as a sun cluster).



> The learning curve? 


Yes.



> The maintenance? 


Yes. 



>The 

> value of reliability and familiar support structures? WHat 

> kind of proof do you have about the claim of RAC's 

> reliability compared to a single mutliple processor system?

> 


 The value of reliability?  I'm not talking about buying some random Intel
white-box vendor - I'm talking about a name like IBM, HPQ, etc.  I have seen
far higher reliability from those vendors than Sun in the last four years.
Case in point - I had three E6500s once supporting over 100 IBM intel
servers.  I had one intel failure in a six month period, and three hardware
failures on the Suns.  That's an impressive reliability ratio from a
hardware perspective.  Familiar support structures should be an oxymoron -
your hardware should fail rarely enough that you should have to look up the
1-800 number you need to call.  

As far as proof of reliability, that's hard to quantify.  However, from a
logical perspective, on an SMP system when a processor fails, the entire
system goes down.  When a node fails in a cluster, the others take over for
it.  Yes - software bugs can rear their ugly head and prevent that from
happening, but that's a constant.




> What about when a node does fail and suddenly the users and 

> batch processing is left with 1/2 or a 1/4 of the procsessing 

> power gone? How long is it going to take to get the system 

> back to 100%? Lots of admins can be confident in gettting a 

> huge hp or sun box up in less than 12 hours. Is 6 hours of 

> downtime worse than three days of processing at 50% capacity?


Is it better to have a performance impacted system or a down system?  Is it
better to buy twice the capacity to compensate for the fact your hideously
expensive UNIX server tends to fall over when there's a two-bit memory error
or cache corruption?  I've never seen an intel box broken so badly it takes
three days to fix.  On the other hand, I had an e4500 that took Sun 7 months
of replacing every part in the system to figure out what was wrong with it.
We had to decomission it as a production server because it was crashing
every few days.

Hey, if you're concerned about node downtime and want to be crafty - buy an
extra node for your RAC cluster.  Splurge and spend the extra $10k for a
node that sits there idle until its needed.  It's _still_ better than buying
twice your needed capacity.  In fact - I haven't run the numbers, but I bet
you could buy double the nodes you actually need and leave them idle and
still be vastly cheaper than two big unix servers.



> What about the value of KNOWING a solution works not just 

> speculating on how much money it MIGHT save.

> 


Do today's solutions "work"? If you're running an enterprise database today,
you need to buy two servers, pay for a clustering software, spend the money
to implement a clustering solution, pay through the nose for platinum
support on these things, and you still need to hire smart people to run
them.  And the end result is a solution where when a server dies, the other
server that's been sitting there sucking down power and idling now gets to
start up oracle and begin processing transactions.  Yes, it technically
functions, but it seems counter-intuitive for an organization that is
generally a cost center to spend extra money to compensate for the fact that
when their incredibly expensive server falls over, it takes the entire
system down with it.  



> The IT industry has fallen because of lots of "sell them the 

> sizzle, get em' the bacon later" marketing hype like the info 

> floating around about RAC and grid. Software and hardware 

> vendors have been jumping from one "great idea" to another. 

> The result is a lot of products that end up in the bone yard 

> and another round of layoffs.


I'm with ya - I'm as amused and skeptical as everyone else at grid computing
and independent clustering initiatives - Sun's N-1 being the shining example
of "sizzle sans bacon".  But you make it sound like RAC is this brand-new
creature that was introduced last week by a tiny unknown company.  Totally
ignoring how long OPS was around, RAC was introduced in June of 2001.
That's two years in the wild and its getting better all the time.



> What is happening is hardware and software vendors are 

> feeding the markets desire to have a low cost system with 

> unlimited power and scalability. I am sorry to say you STILL 

> can't have both. I know what vendors are thinking, they think 

> this will be holy grail of IT that will bring us back to the 

> fat days of pre y2k. 


Hey, I'd love to have a low cost system with unlimited scalability and
power.  If anyone knows what it is, please email me off list.  I've never
said that RAC is appropriate for all environments, and would never even
dream of claiming that it was.  



> "Get the grid going it so complex that they will have to use 

> our consulting services too...once wer'e in the door we'll be 

> there for years." IT directors made the mistake of trusting 

> vendors once. They aren't going to do again.

>


Right.  Don't trust your vendors.  I think every IT department should have
an antagonistic relationship with their vendors (I'm serious).  That
includes not putting all your eggs in one basket and always being willing to
investigate new technology that has the possibility of improving your power
stance against your vendors.  Have a healthy skepticism (I see you've got
that down) and take a look.  How can you possibly be looking out for your
organization's best interests if you're not investigating all of your
options?

 


> Frankly I am all for reducing complexity and increasing 

> reliability. Right now there is proven technology that may 

> cost a bit more but in the long is going to be the right decision. 

> 


"Right now there is proven technology that may cost a bit more but in the
long is going to be the right decision. "

Sounds like you're trusting your vendors a whole bunch. 

How do you know its the right decision?  Because your current solution
works?  I bet it would run on a mainframe as well - that costs a "bit more"
and is definitely more reliable than a UNIX server and has been around for
years.  I don't mean to sound snarky, but how will anything ever be a
"proven technology" if you don't investigate it?  If breadth of deployment
is a yardstick of a "proven technology", we should all be running win2k for
everything.



> "The above notes and my company aside, I would be shocked if 

> I ever implemented a large single-image Oracle instance ever again. "

> 

> Yeah right when monkeys fly out my butt.

> 


Now now, play nice.  If I was ordered to build a single-image Oracle
instance, I would - doesn't mean I'd recommend it.  And with the databases I
have come across, I feel comfortable that most of them could be implemented
successfully and reliably with a RAC cluster. No, RAC is not a silver
bullet.  Yes, I wish it was.  Still - for many many environments, I posit
that it is a viable alternative to the traditional paradigm of
active-passive clustered UNIX servers.

Thanks for your thoughts,
Matt

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Matthew Zito
  INET: mzito_at_gridapp.com

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).
Received on Tue Aug 05 2003 - 16:24:23 CDT