Re: Distributed,Parallel, and ConText

From: Billy Verreynne <vslabs_at_onwe.co.za>
Date: Fri, 22 Jan 1999 16:10:30 +0200
Message-ID: <78gvlt$25q$1@hermes.is.co.za>

Richard Murphy wrote in message <369FC879.AD20EF7B_at_lbpc.com>...
>Currently trying to weigh the pros and cons of a parallel versus
>distributed setup.

What do you exactly mean with distributed setup? Running various warehouse datasets independant of one another on various machines? Or using Oracle's Distributed Database feature. Distributed joins are -not- a good idea!

>I am leaning towards a distributed setup since we
>will be running under NT, and I do not believe that NT supports
>non-shared disks.

Sorry, you missed me again. Maybe it's because the brain tends to get fuzzy on Friday afternoons. :-)

NT supports both shareable and non-shareable disks. Shareable as in SAMBA style or NFS style (have to buy add-on for NFS though). Next, NT also supports "shareable disks" in a cluster configuration. Not sure exactly how it works, but the same disk is treated as if it is a local disk by more than one NT machine in a NT cluster.

The 1st two shareble disks method is a Bad Idea (tm) for distributing a database. One of the most critical factors in any database is the speed of data access. I/O performance will be severely effected using the normalo method of disk sharing across a network. Massive Parallel Processors (MPP) and SMP clusters work around this problem with special hardware and special software. For example, on a MPP box if one node (machine) fails, the "shareable" disks (i.e. the disks belonging to the cluster/mesh) on that node are automatically taken over at hardware level by the node that serves as backup.

>Since we will be doing large nightly loads of data,
>I/O bottlenecks could result. The need for a distributed or parallel
>system arises for performance and load reasons.

Yes. My first inclination is to say go for Oracle Parallel Server (OPS). OPS works well (can work well, I should rather say) in these types of warehouse situations with the data volumes you're talking about. However, as I do not know your requirements, please take this recommendation under advicement. :-)

You can run OPS on NT using NT's clustering software. How well it works, how much you have to fork out for hardware and software, and how much time and effort it is to get it to work - I am totally clueless. Once again I will show a bit of bias and rather suggest either a UNIX SMP cluster or a UNIX MPP box.

The fact is that OPS does work, and work well, in the situation you have described. The disclaimers are that you need OPS experience and properly configured operating system and hardware to make it work.

I've done loads of 5 million records (using direct and parallel loading options) in less than 2 hours into a massively large table. But then we had the horsepower to do it.

>The tables can easily be divided into smaller symmetrical
>subsets. The tables need to have nightly updates (up to 1 - 2 million)
>and on not to infrequent basis (the data is not ours to be responsible
>for) full loads and indexing must be performed. Currently we are
>implementing a single db on VMS, and it doesn't look like it will be
>able to keep up.

I will be hesitant to put my warehouse eggs into the NT basket. I believe NT is a good operating system, but it's lack of hardware scalibility and the relative immaturity of its cluster technology are huge warning signs IMHO. If you can actually split the warehouse up into (hate to use the word) "datamarts", each totally independant of the other, than NT is definately a possible solution. However, if you simply want to split the data across several Oracle databases, each running on a seperate NT machine and communicating with one another via the distributed database option... Not a choice I would make.

A distributed join has a large performance overhead than a local join. Sure, you have two or more CPUs at work, each doing a "part" of the join, but they need to sync with one another. And totally trash the network as a result. Which is exactly why Oracle developed Oracle Parallel Server - it enables you to have data physically spread across several machines, an d to have a database instance running on each of these machines. Trying to do the same as OPS using stock standard Oracle and its distributed database functionality is a recipe for disaster IMHO.

No comments on ConText as I've smoked it once and did not inhale. ;-)

As a side note, it will be interesting to see how well Linux clusters (I believe this is being developed?) compare to NT's "wolfpack" clusters. Especially when running OPS.

regards,
Billy Received on Fri Jan 22 1999 - 08:10:30 CST