Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: suggestions for poor-man's datawarehouse performance problem

Re: suggestions for poor-man's datawarehouse performance problem

From: Stu Charlton <stuartc_at_mac.com>
Date: 19 Jul 2004 21:36:40 -0700
Message-ID: <21398ab6.0407192036.5dbc85cb@posting.google.com>


jhking <jhking_at_airmail.net> wrote in message news:<cd7ald$gk2_at_library1.airnews.net>...

> My objective is to gather new/updated data from each remote site and
> aggregate it on the host. The first issue is that for political reasons
> I can't change anything on the remote sites (which kills replication or
> triggers + queues).

Well... depends on how you're grabbing the new/updated data... is this time-series data ? If not, do you have to run a full comparison of old values (like a select * minus .. ) ? If you can't create tables on the source system you would have to grab the whole table over if it isn't update-timestamped...

Here's a rough solution that comes to mind:

  1. submit a job that grabs data with a select * over a db link using either an index range or rownum range of a particular amount that seems to be "reliable" in the face of your frame relay (like 100,000 or 500,000 records or whatever). This job would have to note in a "control table" how many rows it has updated thus far.
  2. have another job run periodically to remove & resbumit the job if it has hung (this would need to check & update the control table).

Unfortunately DB Links tend to be pretty crappy for bulk loading unless you have a dedicated highspeed LAN between servers. One variant you may want to try is to have a SQL*Plus script or PL/SQL or C stored procedure that generates a flat file for each job , which can be concatenated after completion, and then loaded in parallel with SQL*Loader or external tables. This is more OS-level and arguably a bigger PITA, but might be faster than just a DBLink.

In etiher case, you could run the multiple sources in parallel to speed up total load time, assuming they're on different frame relay circuits (if not, not much point...)

Anyway just some suggestions. My main non-technical suggestion would be to get management to figure out how badly they want this data warehouse. You're making technical workarounds for political problems (which is very common , but not exactly desirable or fun). They might be able to bust enough heads to get you the ability to add objects or at least get the ability to do flat file exports on these source systems.

Stu Received on Mon Jul 19 2004 - 23:36:40 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US