Distributed Queries

From: Michael Kline <mkline_at_richmond.infi.net>
Date: 1995/07/28
Message-ID: <3vblbm$5em_at_allnews.infi.net>#1/1


I asked this question on <ORACLE-L%SBCCVM.BITNET_at_VTBIT.CC.VT.EDU>, but got no response. I realize that many don't have a "global" system of this size, but if you do, I could really help the agency, and make myself probably look pretty good, if I could gather some actual case histories on this subject.

This being a "business" question, you may also comment to me at
"mkline_at_vdh.bitnet"

ANY help appreciated. I've used Oracle for nearly 9-10 years, and I know it can do a lot of things, but when you start doing things with 2-8 GIGS of data that the database wasn't designed for, I can only envision making this action distributed just compounding the problem.



I'm actually looking to see if anyone has tried this, and looking for input to planning.

Let me give a brief description of what's going on, and then ask my question.

We have a large data base on 41 unix boxes doing "Patient Care Management System" all over the state. Right now, "Before Network", we get export files from these 41 sites to create our statewide data reports of all sorts of things. (You know the government. :-) ) These sites can be 2-200 MEG of data each, and the typical site's data takes about 2-4 hours to massage into the format we need to combine it with all the other sites. So we have something like this:

  1. Get site 1 data.
  2. Import into "site1"
  3. Massage "site1" data, 2-3 hours.
  4. Bring into "pcms" consolidation.
NOW WE HAVE A NETWORK!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

We are being told that soon we'll be able to run one query against all 41 sites, and since all 41 of these computers will be doing the work, it will be like it's almost 41 times faster, and all these weeks we spend getting the data from the 41 sites will be over because we can just run a distributed query...

Yes, I know it's POSSIBLE, but has anyone tried this kind of stuff and did you find you **STILL** had to do the "data warehouse" concept and bring the data LOCAL to really be able to work with it? It ONE site took 2-3 hours to "massage", can we expect that a distributed query against ALL 41 SITES will only take the same 2-3 hours?

Oh, I almost forgot. To make things interesting, about 3 of the sites will still have to send their data in as they are running on COBOL and other systems and have to give us "approximate" data, approximate to our format that is...

I've also noticed that when we run a program on one box, pointing to a database on ANOTHER box, and something goes wrong, many times we find ourself with a "runaway process" that can darn near take the
"other" machine to it's knees. We're at 7.1.xxx but quite current as
far as Oracle goes. We're running on NCR/AT&T boxes, 34xx and 35xx.

Anyone got any insight to this? It seems everything is being hyped up that this "distributed query" will solve all our problems, but I'd like to know just how wonderful it is when the combined data of all 41 sites may well be close to 8 gig of data.....

Virtually,

CONCHR-L, BSTUDY-L Co-Owner

Michael A. Kline, Sr. (Maks)                                 "Sanctuary"
13308 Thornridge Ct.                                        "Mr. Teflon"
Midlothian, VA  23112
804-744-9126  (Home)                  Internet: mkline_at_richmond.infi.net
Received on Fri Jul 28 1995 - 00:00:00 CEST

Original text of this message