Re: Why doesn't Oracle care about Linux as IBM does?

From: Nuno Souto <nsouto_at_optushome.com.au.nospam>
Date: Wed, 15 Aug 2001 11:41:14 GMT
Message-ID: <3b7a4ecf.1468069@news>

On Tue, 14 Aug 2001 14:29:46 -0400, Serge Rielau <srielau_at_ca.ibm.com> wrote:

>Hmm, maybe my english is worse than I thought...

No, not at all. It's just a question of terminology. Follow me please for a twilight tour? It's long, but I think it will help both of us understand what we talking about.

>
>Oracle requires "Certified vendor-supplied operating system-dependent layer for
>UNIX, or Oracle operating system-dependent layers for Windows NT and Windows
>2000"
>(http://download-west.oracle.com/otndoc/oracle9i/901_doc/rac.901/a89868/intro.htm#1022312)
>On AIX (just as an example that is):
>snip of all visited links.

Ah, OK! What you mean by a clustered database environment bears no resemblance whatsoever to what I mean. I don't care one bit about HACMP h/w resource routing. Now I understand why we couldn't communicate. Sorry, I should have asked first for definitions. My fault.

>
>DB2 requires none of that (not on AIX, not anywhere)
>I didn't plan to rub that in, just wanted to point out that DB2 can very well
>work clustered without certified software on W2k or XP which is what you
>doubted.
>

OK, this is where it gets long. Let's get something clear here. When I say "clustered database environment", I mean the following:

<clustered database environment>
1 - NO additional version whatsoever of the database. Same version EXACTLY that I can buy for the same price for a single node. Of course, I may have to pay for two clustered node licenses. I don't contend this is good or bad, expensive or cheap, I'm just stating my requirement.

2 - Assume the following h/w config:

2X independent system units. They may even be different units (say, one is an SMP, the other is a single CPU). Same maker, though. I'll call them node A and node B, for ease of reference. +
1 mass storage units using redundant disk arrays and redundant power supplies.
+
1 or more cluster controllers WITH multi node locking facilities (which may be embedded in the storage units above). +
1 SINGLE database instance using space from ALL disks of the storage unit above.

3 - Now the operation of this is:

database software is running in both node A and node B, simultaneously accessing the single database instance and sharing the SAME dictionary, recovery spaces, temp spaces, data spaces,etc.

database transactions take place independently within each node. Eg: transaction 4567 starts in node A and completes in node A, transaction 7654 is executing in node B for its duration. Both transactions are changing the SAME data block, different rows.

Still with me? Thanks.

4 - Problem occurs: node A does a M$ and goes BSOD.

Result:

Node B detects that node A is not there anymore through the cluster controller, when it tries to re-write the shared block. Node B asks its own local database engine to rollback any pending transactions from node A, then continues operation just like before.

Sysadmin re-routes end-user units (terminals, PC's, whatever) to node B, users previously connected to node A now get an error message saying "Bill G. cancelled your last operation, please repeat it". They do, and they continue doing their work.

Sysadmin does the three finger salute to node A, fixes whatever the problem was, then reboots node A and restarts the database s/w. Node A database engine now wakes up, talks to cluster controller and says: "Hi I'm back, let me get to the disks". From now on, node A is back in action waiting for work. Node B doesn't even need to know this has happened and is blissfully unaware that node A is back.

Sysadmin at some convenient stage re-routes the end-user units of node A back to it. One or two of the users (the ones awake) may get another "Bill G." message. All is back to normal operation as in 3 - above.
</clustered database environment>

Conclusions:

Are we talking UDB/DB2 standard edition here? Or EEE? With or without HACMP? BOTH Unix and Windows environments? Because I just described to you what ORACLE does in a cluster environment be it Unix, NT, or even DEC if you can find one.

Matter of fact I was doing exactly this in a three node DEC VAX installation, VMS and ORACLE V7. In 1996, Westmead Hospital in Sydney Australia. The "Bill G." message wasn't suggested by me. IIRC, it was called the ORACLE Parallel Server option back then. I believe it's now native in 9i.

I purposely omitted a few details for simplicity, such as: it might not be a BSOD, it might be a scheduled maintenance in a 24X7 system. Or some additional specifics of 9i, such as distributed caches. Whatever.

What I'm interested in knowing is: what grade (version, option, whatever) and h/w platform of UDB/DB2 are we talking for this to happen, *exactly* as described. Particularly the bits about the SAME data block being accessed by the two nodes, node B not needing ANY system specific intervention WHATSOEVER to rollback node A, and NOT needing to be notified that node A is back. In particular, no need to re-route anything other than the end-user units.

This is what I call a clustered database environment. As in scalable, resilient and with minimal admin intervention to resolve problems.

>Maybe all those vendors are just weeks away from supporting limitless
>scalability as promissed by Oracle, maybe it will take years for Oracle to be
>able to scale to 300 nodes because the OS/hardware just can't do it today.
>

Dunno. Matter of fact, I think Compaq is going for it with their OEM of NT. Not 300, but then again why would anyone want to?

>Either way if you want to use 9i RAC instead of DB2 today you should use AIX
>which is fine by me ;-)
>

Hey! I like AIX, so there! ;-)

Cheers
Nuno Souto
nsouto_at_optushome.com.au.nospam Received on Wed Aug 15 2001 - 06:41:14 CDT