Re: Looking for a scalable solution for many DB links

From: Joe <nospam_at_joekaz.net>
Date: 13 Jan 2004 05:18:14 -0800
Message-ID: <b9c56449.0401130518.77c9964f@posting.google.com>

> > I'm setting up a large data collection process, which will contact a
> > significant number (currently nearly 100, expected to grow) of
> > databases, and collect data from them.
> >
> > I have to "pull" data from the remote databases (firewall issues - I
> > can see the remote DBs, but they can't see me...) and it seems to me
> > that there are 2 possible ways of handling the processing:
> >
> > 1. 3GL code of some sort. Maybe a multi-threaded Python script, which
> > connects to the source and target DBs, and does "select * from
> > source_table" into an array, followed by "insert into target_table
> > values (...)" via array inserts.
> > 2. Use DB links - "insert into target_table select * from
> > source_table_at_db_link".
> >
> > Solution (1) is, in many ways, pretty straightforward. But the
> > boilerplate code around the SQL is fairly complex, and relies on a 3rd
> > party language interpreter. On the other hand, (2) is all in the
> > database, and is trivially simple to do.
> >
> > The problem with (2) is that I'm not sure it scales. I need DB links
> > for every database I'm collecting from, as well as being limited on
> > the number of open links per session. What's the best way round that?
> > Collect from each database in a separate DBMS_JOB? How scalable is
> > running 100+ DBMS_JOBS at regular intervals?
> >
> > The whole thing is in theory pretty straightforward, but it feels like
> > it may not scale well. I want to get it right, now, as this system is
> > replacing an existing one which scales very badly (it broke going from
> > 20 databases to 100...)
> >
> > At the moment, I'm expecting to go with (1), but I'd love to convince
> > myself that (2) would work.
> >

Here's a late reply, hope it still helps: I think I'm doing something very similar to what you're planning. I have various jobs that connect to all of our instances to collect various statistics and store them in 1 central database. I also have jobs that need to run dba administration scripts in all of our instances, while referencing data from one central administration database. These jobs manage about 340 database instances.

In the past, I used to code these scripts with db links. It was simpler to code, but over time this caused me big problems: Our databases were (and always will be) at different versions, for various reasons, and some database versions can't talk to other database versions over db links. Once can argue that I should keep all instances at the same current rdbms version, but in the real world, it's not always possible. You often end up with a few things left at unsupported versions.

So now, I prefer to code these type of jobs in perl with it's DBI module for database access. With this approach, I handle the database communication between instances ranging from 7.2 (don't say it!) to 9.2, and not run into any of the communication problems that I have using db links. As a side benefit, perl is a much better programming language than sql or plsql for some tasks, so being able to mix perl, sql and plsql in the same script lets me do some things much easier than if I were trying to do it all in plsql. I'm not familiar with python, but I think the same would hold true.

--
Joe
http://www.joekaz.net
http://www.cafeshops.com/joekaz

Received on Tue Jan 13 2004 - 07:18:14 CST