Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Mailing Lists -> Oracle-L -> Re: de-dup process
Ok I've created a test case on our development box with 3 striped SATA
disks, 2 CPUs and 2G of RAM, but only 500M are devoted to this 9.2
Oracle instance.
base_table was created from dba_source with one unique column and text
just to fill up some space
Then I multiplied base_table again and again until the count reached
119 258 304 rows. Space it took was about ~30GB.
Then I created temp_table with 20M rows to simulate your loaded table. half of them i.e. 10M rows where different than those in base_table, other 10M were the same.
Then I created unique index on base_table varchar2 column simalting your unique key and altered this column to not null.
OK. Now I was ready to do the insert.
Firstly I've used following insert
insert /*+ append */ into base_table
select * from temp_table where ukrow in (
select ukrow from temp_table
minus
select ukrow from base_table);
with plan you can find in the link at very end of this message
It took 26 minutes and 11 seconds to complete and it consisted of following steps show in v$session_longops Seconds Opname Target
281 Sort Output
18 Sort/Merge
316 Hash Join
246 Sort Output
261 Index Fast Full Scan GINTS.BASE_TABLE
525 Sort Output
37 Sort/Merge
177 Table Scan GINTS.TEMP_TABLE 217 Table Scan GINTS.TEMP_TABLE
The thing I didn't like was at least that GINTS.TEMP_TABLE was scanned twice. Writing SQLs I tend to follow the principle scan any necessary row as less as possible, at best only once. So after the first scan of GINTS.TEMP_TABLE I actaully knew all data for insert and second scan was a simple waste.
Then I rewrote my insert as follows:
insert /*+ append */ into base_table
select ukrow, text from (
select t.ukrow, t.text,
case when b.ukrow = t.ukrow then 0 else 1 end flag
from temp_table t, base_table b
where t.ukrow = b.ukrow(+)
)
where flag = 1;
with plan you can find in the link at very end of this message
It took 16 minutes and 35 seconds to complete
and it consisted of following steps show in v$session_longops
Seconds Opname Target
196 Sort Output
39 Sort/Merge
363 Hash Join
244 Index Fast Full Scan GINTS.BASE_TABLE 147 Table Scan GINTS.TEMP_TABLE
So summary is
10 M rows out of 20 M rows to insert in ~120 M rows table took 16.5
minutes, and it included also 1 UK index update.
So looking in the insert we can see that it is directly dependent on temp_table size, base_table UK index size and involves 1 hash and 1 sort. To my mind it should scale rather linearly or at least the in the same rate as worst of sort and hash join.
One more thing - consider using workarea_size_policy = manual with big sort and hahs area_sizes instead of workarea_size_policy = auto and [possibly] miserable pga_aggregate_target.
Gints Plivna
http://www.gplivna.eu
P.S. resent message because the body was too big. The full spool file you can find in http://www.gplivna.eu/TabInsert.txt
2006/12/14, A Ebadi <ebadi01_at_yahoo.com>:
> Biggest problem we've faced in coming up with a solution is none of the
> solutions so far scale.
-- http://www.freelists.org/webpage/oracle-lReceived on Thu Dec 14 2006 - 08:19:29 CST
![]() |
![]() |