Re: Looking for a scalable solution for many DB links

From: Ryan Gaffuri <rgaffuri_at_cox.net>
Date: 15 Dec 2003 05:41:44 -0800
Message-ID: <1efdad5b.0312150541.6b68f86@posting.google.com>

Paul Moore <pf_moore_at_yahoo.co.uk> wrote in message news:<k755n3l1.fsf_at_yahoo.co.uk>...
> I'm setting up a large data collection process, which will contact a
> significant number (currently nearly 100, expected to grow) of
> databases, and collect data from them.
>
> I have to "pull" data from the remote databases (firewall issues - I
> can see the remote DBs, but they can't see me...) and it seems to me
> that there are 2 possible ways of handling the processing:
>
> 1. 3GL code of some sort. Maybe a multi-threaded Python script, which
> connects to the source and target DBs, and does "select * from
> source_table" into an array, followed by "insert into target_table
> values (...)" via array inserts.
> 2. Use DB links - "insert into target_table select * from
> source_table_at_db_link".
>
> Solution (1) is, in many ways, pretty straightforward. But the
> boilerplate code around the SQL is fairly complex, and relies on a 3rd
> party language interpreter. On the other hand, (2) is all in the
> database, and is trivially simple to do.
>
> The problem with (2) is that I'm not sure it scales. I need DB links
> for every database I'm collecting from, as well as being limited on
> the number of open links per session. What's the best way round that?
> Collect from each database in a separate DBMS_JOB? How scalable is
> running 100+ DBMS_JOBS at regular intervals?
>
> The whole thing is in theory pretty straightforward, but it feels like
> it may not scale well. I want to get it right, now, as this system is
> replacing an existing one which scales very badly (it broke going from
> 20 databases to 100...)
>
> At the moment, I'm expecting to go with (1), but I'd love to convince
> myself that (2) would work.
>
> Thanks,
> Paul

it wont scale. It is also non-flexible in maintenance. We do alot of data loads here. Ive found that transportable tablespaces for something like this is a more 'flexible' way to go. Problem with scaling something like this is network traffic and all the copying to and from hard disks. Will kill I/O.

Im in a situation similiar to this where we pull data from multiple groups in our company. The company is large and its not organized in a way where we can go, ok... everyone handle what we need. We are not at that level. Scaling this is difficult. We automate it, but its VERY hard to keep all bugs out of automation. Maintenance is difficult also.

You might want to consider transportable tablespace. You can use a simple 'consumer producer' automation model. on your target database have a cronjob that polls a directory for a file. This file will contain information on how to load your tablespace and kick off your job. so when your datafile is copied, also copy your control file.

This is not simple. Id be more concern about it not breaking and being maintenable than about 'scaling'. Your scaling problems will be mainly non-oracle related. It will be network traffic and copy to and from your hard drive. Received on Mon Dec 15 2003 - 07:41:44 CST