Re: query across 3 Oracle DBs and MS-SQL

From: Michael Austin <maustin_at_firstdbasource.com>
Date: Sun, 15 Mar 2009 21:04:28 -0500
Message-ID: <xtivl.23132$yr3.10138_at_nlpi068.nbdc.sbc.com>



Ed Prochak wrote:
> On Mar 13, 4:56 am, sydneypue..._at_yahoo.com wrote:

>> On Mar 12, 9:55 pm, Michael Austin <maus..._at_firstdbasource.com> wrote:
>>
>>> sydneypue..._at_yahoo.com wrote:
>>>> Hello Guys,
>>>> I will shortly be tasked with running some complex queries across 3 X
>>>> Oracle DBs and MS-SQL. Each of these db has millions of lines. I have
>>>> run queries again a single Oracle DB with millions of lines with some
>>>> success - that take several hours to complete.
>>>> However I have little idea of what is best practice to query databases
>>>> on different servers and a vendor other than Oracle in the mix too.
>>>> hints and tips and URLs gratefully received!!
>>>> Syd
>>> While there is HS and gateways and database links, you need to
>>> understand how it functions before you pull that trigger. In my current
>>> job I helped fix an app issue where the app did a 5-way join via dblink
>>> to a mainframe (all tables were on the mainframe). At the point I was
>>> called in, their web-based app had almost fatally crippled a mainframe
>>> with many billions of I/O's sec in just an hour or so... It was 8 hrs
>>> before it recovered - after I stopped the "gateway".
>>> In the Oracle db the app did a 5-way join to 5 remotely defined tables.
>>> What the mainframe saw for EACH query was:
>>> select * from a;
>>> select * from b;
>>> select * from c;
>>> select * from d;
>>> select * from e where a=xyz;
>>> So Oracle pulled the 4 tables plus the one or two rows from table 5
>>> totaling more than 450MB for EACH query and then having Oracle perform
>>> the JOIN - and the front end app server was doing thousands of these
>>> per/hr.
>>> Due to semi-static nature of the data, the solution was to create MVIEWS
>>> on the Oracle DB and have them populated once a day. That way the JOIN
>>> occurred locally and NOT on the mainframe/network.
>> Ah - I think the crippling of the dataabase host is something I want
>> to avoid at all costs.
>> However I dont quite undertand this:> What the mainframe saw for EACH query was:
>>
>>> select * from a;
>>> select * from b;
>>> select * from c;
>>> select * from d;
>>> select * from e where a=xyz;
>> What sort of query does this? Why all the columns and all the rows for
>> the first 4 tables?
>> And also the data I need to query is semi-static, ie not often
>> updated.
>> Can you give me just a bit more info on what the view would actually
>> consist of?
>>
>> TIA
>>
>> Syd
> 
> Others please correct me where I am wrong.
> 
> say the first database has a link to a second and table remoteT is on
> the second database
> then given a query like
> 
> select <whatever>
> from localT, remoteT
> where localT.columnx = remoteT.columny ;


That is exactly correct, except in the case I cited it was more like:

remotet{1-5} were individually define database links via heterogeneous services using DB2Connect as the ODBC gateway.

select {bunch of cols... }
from remotet1 a

join remotet2 b on a.id=b.id
join remotet3 c on b.id=c.id
join remotet4 d on c.id=d.id
join remotet5 e on d.id=e.id

where e.col='zyx'

Again, I didn't create the mess, but I did fix it :) and as stated before, be very careful to know and understand how HS works. It can be very good, and as the above case demonstrated, can be VERY bad!

> 
> The local database optimizer knows nothing about remoteT (it assumes
> no stats, no indices). so it sends the equivalent of
>               select * from remoteT;
>  to the remote DB server and performs the join on the local DB. This
> can kill performance if remoteT is a large table.
> 
> maybe newer versions of Oracle have improved on this. I haven't had to
> use links in a long time.
>   HTH,
>   ed
Received on Sun Mar 15 2009 - 21:04:28 CDT

Original text of this message