Re: Slow Load Performance

From: Billy Verreynne <vslabs_at_onwe.co.za>
Date: 22 Jul 2003 01:52:03 -0700
Message-ID: <1a75df45.0307220052.6868ae1c@posting.google.com>

barryh_at_quadrus.com (Barry Hensch) wrote

> We are experiencing very slow insert performance in one of our DW fact
> tables. The fact table is just under 100 millions rows currently, is
> partitioned by month and has 6 bitmap and 3 B-Tree indicies. (The
> Oracle database is v8.1.7.3).
>
> The basic process that we are using to load the table is:
>
> 1. Fetch 100,000 rows using a BULK COLLECT (i.e. using PL/SQL Tables
> for the array)
> 2. Assign the dimension keys through PL/SQL functions
> 3. Insert the 100,000 rows into the fact table
> 4. Delete the arrays
> 5. Start again
>
> With this we can't seem to get over 60 rows/second on a 16 CPU
> SUNFIRE!!Yikes.

The number of CPUs have _nothing_ to do with CPU speed. If you are running a SINGLE insert process, you can have 10,000 CPUs. It will still run on a single CPU, at the raw speed that the single CPU can provide.

To use CPUs (even a single CPU) effectively, you need to multithread and parallalise. Which means firing off 32 or more insert processes at the same time, where each of these does x number of rows.

> We have also tried a simple INSERT AS SELECT where we have preassigned
> the dimension keys during a staging step and the INSERT took 2.5 hours
> for 450,000 rows.
>
> Any ideas on what could be slowing these inserts down . . . I am
> currently suspicious of the indexes.

Yes. That is the #2 cause IMO. I would think that the #1 cause for the slow performance is using a PL/SQL function to assign dimension keys. If that function has an ecxecution time of 5ms.. work out the execution time for a 100,000 rows.. Then take into consideration context switches and other overheads per row that is caused by PL/SQL processing...

Barry, you said it is partitioned. If the insert process adds a new month, then do not use insert.

Create a brand new table containing that month's data. SQL*Load the new month's OLTP/raw data. Use a single SQL SELECT, joining to the dims to get the dim keys. Add nologging. Create a fact table for that month only.

After the table is created, add the indexes to it. Then swap the contents of the empty fact partition with the contents of that fact table (a sub-second SQL to exchange 1GB or 100GB).

And, IMO, for heavensake stay away from PL/SQL when dealing with large volumes of data. By nature, you will be doing row based processing in PL/SQL. That is a performance killer when dealing with VLTs. VLTs must be processed as data sets, and not as rows, e.g. doing minus, intersects, plain SQL and using intermediate work/temp tables where needed.

--
Billy

Received on Tue Jul 22 2003 - 03:52:03 CDT