Re: Distributed Queries

From: Jim Smith <jim_at_jimsmith.demon.co.uk>
Date: 1995/07/30
Message-ID: <807092574.8589_at_jimsmith.demon.co.uk>#1/1


mkline_at_richmond.infi.net (Michael Kline) wrote:

<<Summarised.

Wants to replace manual importing into a central database from 41 sites with a distributed system querying the 41 sites from the central database. >>

I don't have any direct experience on this scale, but here goes...

I think you probably want replication rather than distribution. Whether you implement replication "manually" using database links or use the new replication options doesn't matter.

What is wrong with Distribution?

Very dependent on network and node reliability.

If you have a query which does something like  select x,y,x
from table1_at_site1,

        table2_at_site2

It either machine, or the network link to a machine goes down then the entire statement fails.

Performance.

While in theory each component of a distributed query could execute on its own machine, the oracle optimizer may not always get it right. (At Version 7.0.x distributed selects are handled by copying the entire remote table into a temporary segment on the local database. There are supposed to be improvements at later versions but I don't know how effective these are.)

Complexity

The complexity of the queries used to collect results from 41 sites simultaneously will make them very difficult to maintain. (Think about it. Each query will involve 41 tables, or 41 similar queries linked by the UNION operator.)

A true distributed database is only necessary if all sites need instant access to the most up to date information. It doesn't sound as if you need this.

Why is replication better?

By replication I simply mean copying the required data into the right place, and not necessarily what Oracle will try to sell you with their Symmetric Replication Option.

It is essentially a more automated version of what you are doing now, relpacing the import and massage with a remote query to copy data into the central database.

The advantage is that you can run them one at a time or all at once, depending on what sites are ready to send, what sites are accessible and what processing power is available. If one or more sites end up missing, you can still run your reports on what information you have (and add footnotes).

There a number of options with this method.

Use remote queries from the central site.


This doesn't even need the distributed option but most of the processing will be on the central machine (depending on the optimiser)

Remote updates at local sites to push data into the central database.


This does need the distributed option but processing will be more evenly shared.

New-fangled Replication Options.


This would create a snapshot of each remote table on the central database. I think each snapshot would be a separate "table" and would have to be combined by some process at the central site.

This needs the distributed option and I think most of the processing is at the central site.

What about Centralisation?

I know it is unfashionable nowadays, but why not have a single central database accessed remotely by the individual sites? 8 Gb is not really that big nowadays. Received on Sun Jul 30 1995 - 00:00:00 CEST

Original text of this message