Large Datamart Crosstab Table

From: Nedry <bretternst_at_speakeasy.net>
Date: 23 Apr 2004 19:24:06 -0700
Message-ID: <b4f505d0.0404231824.435fb2cf@posting.google.com>

I've got a rather frustrating problem that someone may have encountered before. Here's the background:

We're running Oracle 9i on Redhat Linux. It's a quad-processor box with 4.5GB RAM, connected to a RAID-1/0 disk array.

We have a very large table (363 million rows) that is partitioned by a set list of "as of" dates (i.e. a series of month-ends). A particular record will usually appear across many month-ends, with the same ID number, but with different data in the rest of the columns.

Management has decided that they want a "time-series" table that is horizontally-oriented (that is, a typical pre-processed crosstab table, with one column per month, for 60 months).

I have tried several different methods to produce this table, but always run into major resource constraints. We had previously run a similar job in SQL Server successfully, and relatively quickly, so I have to assume that we are either not writing the query efficiently in Oracle, or we do not have the database tuned correctly.

The two different methods I have tried so far:

Using the standard DECODE / MAX / GROUP BY ID_NUMBER query that is often posted in the newsgroups. The problem with this query is that it produces a temporary table that is 60 * 363 million rows before the group by is performed. Even with a 84GB temp tablespace, we run out of room and the query fails. I can't seem to find any way to order the id_numbers so that they are grouped "on-the-fly" rather than building the entire 60 * 363m row table first.
I have iterated through each month-end partition, using a MERGE INTO statement to either update or insert into the target table, depending on whether ID_NUMBER exists. The UPDATE changes only the column that is affected by the month that is running, and the INSERT puts NULLs for all non-affected columns. Doing this forward through time should end up with the same result as method #1. Problem is, this runs -extremely- slowly. It does a hash join between the source table and the target in order to perform the MERGE, which seems to take a long time (it takes about 3 hours for a single month-end with 5m rows). That would end up with a running time of about a week.

I have tried many different combinations on the target table - unique indexes, index-ordered, no indexes at all, hash-based partitioning, etc. I would post specific code, but there are so many iterations of it, that I want to take a fresh approach to the problem.

So, if anybody else has had experience with pivoting extremely large tables like this, I would very much appreciate some tips on how you approached it. I'm just looking for a push in the right direction, not necessarily exact code.

Once again, I very much appreciate any advice any of you data-warehousing people might have! Received on Fri Apr 23 2004 - 21:24:06 CDT