Re: What's FASTEST way to IMPORT Massive amounts of TRANSACTIONAL DATA? (Non-SQL*Loader)

Home -> Community -> Usenet -> c.d.o.server -> Re: What's FASTEST way to IMPORT Massive amounts of TRANSACTIONAL DATA? (Non-SQL*Loader)

From: Equipe BDM <e-bdm_at_societe-generale.com>
Date: Wed, 01 Dec 1999 16:30:02 +0100
Message-ID: <38453EFA.7AC75E1F@societe-generale.com>

Billy Verreynne wrote:

> [cross postings removed]
>
> Papaya Paradise wrote in message
> <821tgu$kpe$1_at_ash.prod.itd.earthlink.net>...
>
> >SQL*Loader is fastest for raw pumping of data straight to tables but we
> have
> >to do queries as we import to see if an Entity is already in the database.
> If
> >they are, then update their ENTITY and ENTITY_ATTRIBUTES information in the
> >database. If not then insert into the below type of tables.
>
> Interesting problem. No matter how you cut and slice this problem, you need
> to check the keys of the existing data and new data, to determine what needs
> to be inserted (new keys) and what needs to be updated (existing keys).
>
> A dirty method that may work, but which I will not recommend is to use
> SQL*Loader. Up the number of handling errors so that the complete input file
> will be processed by SQL*Loader. Run the file thru SQL*Loader and load the
> data. All new keys will be inserted. All errors (i.e. existing keys) will be
> written away to the badfile. Now update the table with the badfile data.
> Very dirty method IMO. But worth mentioning anyway.
>
> Based on similar monthly loading problems in a data warehouse environment,
> one of the things I would try first is the following:
>
> 1. Direct and parallel load the data using SQL*Loader into a temp table.
> Fastest way to get data into Oracle, especially if large volumes need to be
> loaded.
>
> 2. Run a create index on the temp table to create index on the key.
>
> 3. Run a CREATE TABLE AS UNRECOVERABLE SELECT, doing a minus between the
> temp table keys and the source table keys. This will give me a temp table
> with new keys to be inserted.
>
> OK, now we have identified the new keys. We can either now use this table to
> join the temp table to identify the to-be-inserted-data or the
> to-be-updated-data. As this join needs to be made twice (maybe more?), it
> may make sense to simply split the temp data table now into two data sets,
> doing these joins up front, using two CREATE TABLE AS UNRECOVERABLE SELECT
> statements.
>
> 4. Run a INSERT INTO TABLE AS SELECT, inserting the new data. Either by
> using the original temp table joined to the temp keys table, or from the
> insert data set.
>
> 5. Run UPDATE with SELECT using the data from the temp table outer joined to
> the temp keys table. Or we can simply doing the SELECT from the second
> (update) data set. Ideas here to increase performance. Increase rollback
> segment to make the transactions as large as possible. Do the UPDATE in
> parallel by giving each UPDATE SELECT a range of keys to process.
>
> One of the problems with this method is that you require space for the temp
> tables and indexes.
>
> Other ideas are to maybe do the UPDATE first, then drop the indexes from the
> source table and then run INSERT. After you have inserted the new data,
> re-create the indexes. In fact, I would recommend that you do try this
> irrespective of method you use as an INSERT can spend up to 99% of it's time
> scanning huge unique indexes before inserting a row to ensure that it does
> not create duplicates. You can also create indexes in parallel, which cam
> make the index re-create option much more viable.
>
> The main thing to do is to identify where the bottlenecks are likely to
> occur. And then focus on those areas and try to either work around them or
> optimise them. In my case I had lots of CPU power at my disposal as we were
> building the warehouse in OPS on 12 nodes. So once I had the data into
> Oracle, I could make use of massive parallel processes to address these
> problems. My bottleneck was getting the data down from the mainframe. Your
> bottlenecks may just be reversed. In some cases it's a good idea to perform
> as much pre-processing (on another platform even) as possible. Like
> supplying the source data system with a list of existing keys and have it
> generate two data sets for you - a insert-key data set and an update-key
> data set.
>
> Many ways of skinning this cat. :-)
>
> regards,
> Billy

You could use
the direct-load Insert
" INSET /*+ APPEND*/ INTO TABLE as select"

It exactly the equivalent of a sql*Loader with that, oracle doesn't use the cache buffer and insert directly into the datafile
over the Higt water Mark;

You could use the NOLOGGING parameters ( with a alter table) With this you generateno redo log

Vincent PARIS Received on Wed Dec 01 1999 - 09:30:02 CST