Re: OCFS for Windows

From: koert54 <nospam_at_nospam.com>
Date: Thu, 06 Mar 2003 10:33:57 GMT
Message-ID: <puF9a.455$Ma.32@afrodite.telenet-ops.be>

Well - I'm glad you have it running without too much trouble - may I ask are you
a consultant for that site, a project manager or are you the day-by-day DBA for
those databases ?

Don't get me wrong - I wish I could say the same of our RAC - but I just can't.
We are experiencing problems like :
- split brains - we have Gigabit private interconnect + 100Mbit public interconnect .. we have tried crosscables and switches - when a split brain occurs - normally hartbeats can happen through the controlfiles in order to decide which node will go down - no such thing ... they time-out on it and both crash - lotsa IPC errors for no apparant reasons - resulting in problems described above :-)
- we have a shared pool now of 800Mb on both instances - still the instances keep on crashing due to memory
shortage in the shared pool - my guess is that because PCM locks are now automated and thus, you don't have
control over the number of lock elements in the lock database - the lock database keeps on growing, trying to
get a 1:1 LE:BLOCK granularity as much as possible. A flush of the shared pool won't help so instead we have
to rebounce. Oracle replies : encrease shared pool .... yeah right - till I reach 2GB and I don't have a buffercache anymore or no processspace for spawning threads ?:-)
- timing problems : if the system clocks on both nodes start to differ you're in for a big surprise :-)
- RAC on NT/2000 does not load balance it's inter-instance messaging over multiple nics - also the fail-over of
nics doesn't work ... if I pull the private interconnect you would expect it to keep on running because of the
public interconnect - no such thing, instead you get one node going to 100% CPU and needs a cold boot while the
other just sits there and finally also crashes ... - if we bounce one instance - 20% chances we have to bounce the whole cluster
- if we add new raw devices and symbolic links, sometimes the object services does not replicate them over the nodes - a reboot is necessary :-)
- we've started a year ago with 9.0.1 - we're now at level 9.2.0.2 - did lotsa firmaware upgrades on the hardware (IBM)and still no solid cluster - these are just some of the problems we've had - we have spend so many hours on this thing - we could have easily bought 2 decent Tru64 nodes and maybe, just maybe it would have worked. I have talked to some collegues who run OPS on 16 nodes SP2s, RAC on Tru64 - they all had their share of sleepless nights. So consider yourself
very lucky or if you're not the DBA - ask your DBA's what they do late at night :-)

Again - a bitter and tired RAC DBA :-)

"David Fitzjarrell" <oratune_at_msn.com> wrote in message news:32d39fb1.0303051112.74d1f551_at_posting.google.com...
> "koert54" <nospam_at_spam.com> wrote in message
news:<3e65f784$0$2184$4d4efb8e_at_news.be.uu.net>...
> (snip)
> > My personal opinion is - if you're bold enough to run a parallel server
on a
> > windows platform don't be surprised to be shuffling a lot of shit with
> > your back against the wall !
> >
> (snip)
>
> We regularly run OPS on 2000 and have no problems with it. In fact
> these systems are located in Central America, South America and the
> Caribbean, running pre-paid cellular systems, so you can imagine the
> traffic they get. We rarely have downtime (with the exceptions of
> hardware issues and, sometimes, the local 'nut behind the wheel'), so
> I WILL be surprised if I end up 'shuffling a lot of shit with your
> (read 'my') back against the wall !'
>
> David Fitzjarrell
Received on Thu Mar 06 2003 - 04:33:57 CST