Re: Statistics during a complex load

From: Tom Anderson <twic_at_urchin.earth.li>
Date: Fri, 25 Feb 2011 16:57:28 +0000
Message-ID: <alpine.DEB.1.10.1102251631200.16341_at_urchin.earth.li>



On Sun, 20 Feb 2011, onedbguru wrote:

> On Feb 20, 5:00 pm, John Hurley <hurleyjo..._at_yahoo.com> wrote:
>> Gerard:
>>
>> # I've probably missed something, otherwise there would be a very
>> simple solution: drop all constraints (eg. foreign keys) and indexes
>> before the load, and, if necessary, create them afterwards.  That's
>> always been the fastest way to load data.
>>
>> Works well for non partitioned tables and you have the option of
>> dropping or just disabling the foreign keys depending on exactly how
>> they were setup initially.
>>
>> Of course if you need concurrent access to the table(s) while loading
>> it gets more involved.  Typically I would consider getting all the
>> stuff into a staging table ...
>>
>> The volume of data and the timing of handling changes to it is
>> critical to coming up with a complete understanding of what might
>> potentially work.
>
> What is the volume of data? If you are on 11gR2, look at exchange
> table and external tables and "insert into realtable select * from
> externaltable". This actually uses SQLLOADER under the sheets and can
> be blazingly fast. Bottlenecks are typically the "mappers" you
> described.

The use of the mapper is definitely the bottleneck here. But the use of the mapper is largely non-negotiable, i'm afraid. I know exactly what reaction i'd get if i walked up to my project manager and suggested ditching it and doing our dataload in pure SQL.

> Just curious, but when doing a data "merge" how do you determine which
> column get updated and which columns stay the same? You would almost
> need a timestamp column for each data column... and that would be a real
> pain.

We don't ever do a merge like that. But the mapper has to check to make sure that we're not. If we did, it would punt to its merge-handling process, which in the interactive system is a fancy web application, and in the load is an error!

> As for the previous reply to drop constraints etc, that can be very
> dangerous if you depend on that referential integrity. Indexes (except
> PK) maybe, but the thought of having to potentially clean up data so you
> can re-add those constraints is not a very pleasant thought. You would
> need to **guarantee** that the referential integrity was completed
> somewhere upstream - which may negate the need for it at the db level.

The problem is that the mapper does a lot of queries during the load. Going without the indexes is a complete non-starter.

tom

-- 
Who would you help in a fight, Peter van der Linden or Bill Gates?
Received on Fri Feb 25 2011 - 10:57:28 CST

Original text of this message