Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: OCFS for Windows
Well - I'm glad you have it running without too much trouble - may I ask are
you
a consultant for that site, a project manager or are you the day-by-day DBA
for
those databases ?
Don't get me wrong - I wish I could say the same of our RAC - but I just
can't.
We are experiencing problems like :
- split brains - we have Gigabit private interconnect + 100Mbit public
interconnect .. we have tried crosscables and switches
- when a split brain occurs - normally hartbeats can happen through the
controlfiles in order to decide which node
will go down - no such thing ... they time-out on it and both crash
- lotsa IPC errors for no apparant reasons - resulting in problems described
above :-)
- we have a shared pool now of 800Mb on both instances - still the instances
keep on crashing due to memory
shortage in the shared pool - my guess is that because PCM locks are now
automated and thus, you don't have
control over the number of lock elements in the lock database - the lock
database keeps on growing, trying to
get a 1:1 LE:BLOCK granularity as much as possible. A flush of the shared
pool won't help so instead we have
to rebounce. Oracle replies : encrease shared pool .... yeah right - till I
reach 2GB and I don't have a buffercache anymore or no processspace for
spawning threads ?:-)
- timing problems : if the system clocks on both nodes start to differ
you're in for a big surprise :-)
- RAC on NT/2000 does not load balance it's inter-instance messaging over
multiple nics - also the fail-over of
nics doesn't work ... if I pull the private interconnect you would expect it
to keep on running because of the
public interconnect - no such thing, instead you get one node going to 100%
CPU and needs a cold boot while the
other just sits there and finally also crashes ...
- if we bounce one instance - 20% chances we have to bounce the whole
cluster
- if we add new raw devices and symbolic links, sometimes the object
services does not replicate them over the
nodes - a reboot is necessary :-)
- we've started a year ago with 9.0.1 - we're now at level 9.2.0.2 - did
lotsa firmaware upgrades on the hardware (IBM)and still no solid cluster
- these are just some of the problems we've had - we have spend so many
hours on this thing - we could have easily bought 2 decent Tru64 nodes and
maybe, just maybe it would have worked. I have talked to some collegues who
run OPS on 16 nodes SP2s, RAC on Tru64 - they all had their share of
sleepless nights. So consider yourself
very lucky or if you're not the DBA - ask your DBA's what they do late at
night :-)
Again - a bitter and tired RAC DBA :-)
"David Fitzjarrell" <oratune_at_msn.com> wrote in message
news:32d39fb1.0303051112.74d1f551_at_posting.google.com...
> "koert54" <nospam_at_spam.com> wrote in message
news:<3e65f784$0$2184$4d4efb8e_at_news.be.uu.net>...
> (snip)
> > My personal opinion is - if you're bold enough to run a parallel server
on a
> > windows platform don't be surprised to be shuffling a lot of shit with
> > your back against the wall !
> >
> (snip)
>
> We regularly run OPS on 2000 and have no problems with it. In fact
> these systems are located in Central America, South America and the
> Caribbean, running pre-paid cellular systems, so you can imagine the
> traffic they get. We rarely have downtime (with the exceptions of
> hardware issues and, sometimes, the local 'nut behind the wheel'), so
> I WILL be surprised if I end up 'shuffling a lot of shit with your
> (read 'my') back against the wall !'
>
> David Fitzjarrell
Received on Thu Mar 06 2003 - 04:33:57 CST