Re: Best solution: What would you do?

From: L120bj <l120bj_at_aol.com>
Date: 1997/03/23
Message-ID: <19970323104801.FAA15352@ladder01.news.aol.com>

This sounds very similar to a process that we undertake every 8 weeks for the Datawarehouse where I work - were ORACLE involved in the design process for your database ?
Every 8 weeks we unload data from 8 individual weekly tables and load them into an 8 week chunk table. This involves loading between 15 and 20 million rows (quite long rows) and building the relevant indexes. This all takes place within one overnight batch run. The major differences that I see between our jobs and yours are that we use direct load. We also use a utility called syncsort for Unix to sort the input records into primary key order and then build the indexes (including the primary key), using the parallel option, once the load has completed.
I can't help thinking that it is the database trigger which is having the most effect on your load times, since you cannot use direct path, it is firing for every row and I assume that you need to leave the indexes on the table to support it which will also slow down the loading process. From what you have said in paragraph 2, the batch processing only loads unmatched information, so unless your users enter data on-line into these tables I can't see the need for the trigger. However, even if you do not wish to remove the check for duplicates, a better solution would almost certainly be to disable the trigger prior to the load, sort the file to be re-loaded into an order such that a 3GL (or PL/SQL from 7.3 onwards) can process it to identify the duplicates. If the check reports no errors you could then load the data in direct path (unrecoverable is another option worth considering, as long as you back up the database immediately after the loads). Once the data has been loaded you could then build your indexes using the unrecoverable option. You could then re-enable your trigger.
As I understand partition views, they rely on the use of a constraint which ensures that any particular value in one column can only appear in the rows of one partition, eg period. Is this true for the 'chunks' you are thinking of implementing? If so then this would be a very good solution.

Hope this helps, if you want to discuss my response further email me. Rob

Subject: Best solution: What would you do? From: Bruce Bristol <bbristol_at_ix.netcom.com> Date: Sun, 23 Mar 1997 00:20:49 -0800
Message-ID: <3334E7E1.7CC5_at_ix.netcom.com>

Hello,

I would like to get feedback from people on the following situation and my solution...

My employer has a history database which consists of 8 base tables each containing 18 months of data and 8 corresponding incremental tables each containing the latest 3 months of data. There are views for each of the base and incremental tables.

The tables are used for batch processing where input data is matched against the tables to make a flat file for further processing. At the end of processing we use sqlldr to insert the unmatched data into the tables.

Here's the brutal part...

Every 3 months we 'purge' the history database. We read in the 21 months
of base+incremental data, write out the oldest 3 months to a flat file to
be archived to tape and the most recent 18 months to a flat file to be re-
loaded into the base table. Once done, the incremental table is then truncated.

This process takes approximately 2 weeks to complete. This is due to the
85Gb of data housed in the tables, ranging from 17 million to 60 million rows per base table.

We are currently on Oracle 7.1.4 on a Sun Solaris environment. We will be
upgrading to the latest 7.3.x.x release within the next 2 months.

I want to decrease the amount of time it takes to do the 'purge' significantly. While we are purging we are NOT in archivelog mode, and we
are NOT loading current data into the database. This methodology has been
put in place by my predecessors.

Anyway, here's my 'solution'. I'd like to know what you think and if you
have any better ideas before I implement...

I want to take each of the 8 base tables and break them into 6 pieces, where they'd each contain 3 months of data (6*3 = 18 months). When it's time to do the 'purge' again, we can simply unload and archive to tape the oldest table, unload the incremental table and load it into the table
that was just truncated and then truncate the incremental table. We can do this for each of the 8 base tables.

Another thought would be to unload and archive to tape the oldest table and then use rename tables using ALTER...RENAME...

My greatest concern breaking the tables from 8 base + 8 incremental is that it'll now be 8*6 base + 8 incremental and Oracle's performance.

Look at it this way, per base+incremental table:

     Today:  Unload 21 months, load 18 months
Solution 1:  Unload 6 months, load 3 months
Solution 2:  Unload 3 months, load 0 months

With Oracle 7.3.x.x we can used partition views to access the data. As I understand it, there will be none/very little performance degradation querying 7 tables, v.s. 2 tables in 7.1.4.

The other concern is a trigger that was built to make sure duplicate data isn't added to the view (base+incremental). Every time we do a load using sqlldr, the trigger checks the base table to make sure the row doesn't already exist. I have to wonder the best way to take care of that using Oracle 7.3.x.x and 6 'little bases' that our loads need to check against. Do partition views work the opposite way as well, meaning will they work on the 6 'littel base' tables at the same time vs. 1 at a time, just as it'll query the 6 tables at the same time when doing a query?

Sorry for the long msg, but I feel a full explanation will help to limit the number of 'You didn't specify this or that' responses.

Thank you!

-Bruce Bristol Received on Sun Mar 23 1997 - 00:00:00 CST