Re: Data Warehouse. Massive Join. HELP!

From: Steve Phelan <stevep_at_toneline.n-o-s-p-a-m.demon.co.uk>
Date: Thu, 25 Feb 1999 22:59:05 +0000
Message-ID: <36D5D5B9.13BFA1B4@toneline.n-o-s-p-a-m.demon.co.uk>

I'd only add to the good replies you got by saying why does this table require a 7 way join? Maybe a little more dimensional analysis is required to come up with an alternative database design that will meet the requirement you seem to have set yourself. Accept the fact that this might be your only solution - N.B. I'm say it *is*., just that it *may be*.

Have you tried breaking the summary into smaller parts, i.e. performing a pre-aggregation to reduce the joins that any one SQL statement has to perform, and then running parts individually to see how it goes? This will cost you some 'scratch' space, but it may prove an interim solution short of the above idea in the first paragraph. Likewise, have you actually monitored the memory, disks, CPU's and Oracle datafiles to see where the time is being spent?

We build weekly summary tables after loading our main fact tables which contain 100's of millions of rows without hitting the 5 hours you quote. Obviously no two systems are the same, but even our smaller systems with 130,000,000 row fact tables are only on an IBM F50 with 4 processors and 512MB of ram, and they run summary builds in less than half the time you quote. We do minimal table joining, always run summary table builds in parallel, drop before/rebuild after bitmap indexes in parallel, run off striped raw disks, kill redo generation, use Oracle8.0.5 and use Oracle partitioning on the main fact tables. We dedicate a lot of the sever's memory to Oracle SGA and sort space, and have very large temporary segments on dedicated striped disk sets (NB. sort areas and temp can be hammered by summary table builds). Our IBM S70 runs this lot about 2-3 times as fast, and is probably more representative of your SP2, although without knowing what sort of node you are running on it's impossible to tell.

Still, my 'gut' instinct is the 1-4 ratio on row counts you are showing from fact-to-dimension just seems plain 'wrong' for a datawarehouse database design. Our fact tables are around 100 times larger than even our most heavily collapsed dimensions, and that seems pretty standard from most of the datawarehousing books I've read.

Regards,

Steve Phelan

(Oracle 7 & 8 OCP)

stuco_at_mailcity.com wrote:

> Here's my situation:
>
> Data Warehouse - building a summary table based on a 7-table join (driving
> table is 2 million rows, other tables are about 500,000 rows each).
>
> To load this table takes 5 hours with a nice Rule-based query that has been
> tuned to use unique and range index scans. Thing is...IT TAKES 5 BLOODY
> HOURS!!
>
> So I analyzed all the tables and indexes, ran it through the Explain Plan and
> all my lovely index scans in the Cost-based mode have turned into full table
> scans with sort-merge joins.
>
> So I think, 'well maybe this might be faster'. ---Nope.
>
> This is the first of many tables that are being built from our data store and
> with the data is taking half a day to generate. We have an IBM SP/2 running
> AIX 4.2 with 8 processors and 2Gig of memory.
>
> What can I do to tune this thing??
>
> Thanks for your help.
>
> Stuart Cowen
> Paladin Consulting - Dallas
>
> -----------== Posted via Deja News, The Discussion Network ==----------
> http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
Received on Thu Feb 25 1999 - 16:59:05 CST