Re: Bulk DML question ...

From: <sergey_s_at_my-deja.com>
Date: Sat, 11 Nov 2000 14:20:49 GMT
Message-ID: <8ujkjt$sb$1@nnrp1.deja.com>

Ok, I ran the two statements (using the smallest partition of TABLE1: ~ 45 mil rows) and they both finished in a little over 50 minutes (Jonathan's SQL was about a minute faster). However, the result sets were different. Jonathan's example produced a larger data set (I can't remeber what the row difference was). I looked at the data and discovered that I was perhaps wrong when I replied to the following Jonathan's question:

>Is the drop in row count
> largely due to repeated group_codes or repeated names ?

I originally replied that it was due to repeated names. In reality, there are bunches of different names that belong to certain group codes. Meaning, there may be a 100 different names in TABLE1 that all belong to group code 1 (found in TABLE2), another 50 different names in TABLE1 that belong to group code 5 (found in TABLE2), an so on.

More importantly, I have oversimplified my example here in the NG. I didn't want to post everything because it would be too much, so I tried to present the essence of the problem. Now I think that I've missed some of the important details. I will try to repost very soon making sure that I don't miss anything important. Everything that's been talked about in this thread still was valuable to me and I went back and revised some of the settings as people here suggested. Thank you very much for all your help, and I will work on my new post.

Sergey

In article <8uelf8$2hr$1_at_nnrp1.deja.com>, sergey_s_at_my-deja.com wrote:
>
> > sum(b.count) sum_count -- how did you get
> >COUNT
> > as a column name ?
>
> That's my mistake, sorry. The column name is CNT.
> I will try recoding the SQL the way you suggested and see what
happens.
> Will post results back to NG.
>
> Thank you!
> Sergey
>
> In article <973628946.28797.3.nnrp-12.9e984b29_at_news.demon.co.uk>,
> "Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote:
> >
> > I should also have asked:
> > If two rows have the same name
> > do they have the same ID.
> >
> > Still, assuming they do, this should
> > give you the same result, and may
> > be quicker because the sort/group
> > is on an narrower set, and the join
> > is between fewer rows:
> >
> > (e and o.e. in the text)
> >
> > CREATE TABLE TEMP_TABLE PARALLEL (DEGREE 8 INSTANCES 1) NOLOGGING
> > AS
> > select
> > a.group_code,
> > il.name
> > il.sum_amount,
> > il.sum_count
> > (
> > select /*+ no_merge */
> > b.id,
> > b.name,
> > sum(b.amount) sum_amount,
> > sum(b.count) sum_count -- how did you get
COUNT
> > as a column name ?
> > from
> > table1 b
> > group by b.id, b.name
> > ) il,
> > table2 a
> > where
> > a.id = il.id
> > ;
> >
> > --
> >
> > Jonathan Lewis
> > Yet another Oracle-related web site: http://www.jlcomp.demon.co.uk
> >
> > Practical Oracle 8i: Building Efficient Databases
> > Publishers: Addison Wesley Longman
> > Book bound date now 8th Dec 2000
> >
> > sergey_s_at_my-deja.com wrote in message
<8u7u1j$igr$1_at_nnrp1.deja.com>...
> > >
> > >> A coupld of quick question for this query though, is ID
> > >> unique on the smaller table ?
> > >
> > >Yes, it is.
> > >
> > >>Is the drop in row count
> > >> largely due to repeated group_codes or repeated names ?
> > >
> > >Due to repeated names.
> > >
> > >
> > >Thank you!
> > >Sergey
> > >
> > >
> > >In article <973373984.28182.0.nnrp-02.9e984b29_at_news.demon.co.uk>,
> > > "Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote:
> > >>
> > >> Aggregating 3Billion rows down to 1.2B rows
> > >> is bound to be very slow. It is possible that
> > >> your strategy of breaking the aggregation into
> > >> steps needs to be reviewed so that you optimize
> > >> a step which has a higher density of aggregation.
> > >>
> > >> A coupld of quick question for this query though, is ID
> > >> unique on the smaller table ? Is the drop in row count
> > >> largely due to repeated group_codes or repeated names ?
> > >>
> > >> --
> > >>
> > >> Jonathan Lewis
> > >> Yet another Oracle-related web site:
http://www.jlcomp.demon.co.uk
> > >>
> > >> Practical Oracle 8i: Building Efficient Databases
> > >> Publishers: Addison Wesley Longman
> > >> Book bound date now 1st Dec 2000
> > >>
> > >> sergey_s_at_my-deja.com wrote in message <8tv6in$ta4
$1_at_nnrp1.deja.com>...
> > >> >Yes, the explain plan shows hash join and every step is
> > >> >parallel-to-parallel (except for the collection at the end, of
course
-
> > >> >that one's parallel-to-serial). The query creates a "processing
table"
> > >> >(approx. 1.2 billion rows less than the base table). This
processing
> > >> >table is later queried to create more temporary tables which in
turn
are
> > >> >used to create the final aggregate tables. Basically, the
aggregation
> > >> >process was broken down into smaller steps for performance
reasons
(we
> > >> >achieved 30% reduction in time).
> > >> >
> > >> >Thank you!
> > >> >Sergey
> > >> >
> > >> >
> > >> >In article <8tuua4$lf8$1_at_nnrp1.deja.com>,
> > >> > jdarrah_co_at_my-deja.com wrote:
> > >> >> Does the explain plan show that its using hash and that the
statement
> > >> >> is running in parrallel? Also, what's the next step after you
create
> > >> >> the table?
> > >> >>
> > >> >> In article <8tqg61$15b$1_at_nnrp1.deja.com>,
> > >> >> sergey_s_at_my-deja.com wrote:
> > >> >> > Let's say I have two tables:
> > >> >> >
> > >> >> > CREATE TABLE1
> > >> >> > (
> > >> >> > TIME_KEY NUMBER,
> > >> >> > ID NUMBER,
> > >> >> > NAME VARCHAR2(20),
> > >> >> > AMOUNT NUMBER(8,2),
> > >> >> > COUNT NUMBER,
> > >> >> > ...
> > >> >> > );
> > >> >> >
> > >> >> > CREATE TABLE2
> > >> >> > (
> > >> >> > ID NUMBER,
> > >> >> > GROUP_CODE NUMBER,
> > >> >> > ...
> > >> >> > );
> > >> >> >
> > >> >> > TABLE1 has approx. 3 billion rows (almost 200 Gig.). TABLE2
has
approx.
> > >> >> > 3 million rows. TABLE1 is partitioned by TIME_KEY
(partitions
have
> > >> >> > anywhere from 50 to 100 million rows each). I looked into
making
> > >> >> > partitions smaller based on additional criteria, but the
next
candidate
> > >> >> > column has only about 500 unique values.
> > >> >> >
> > >> >> > Every row in TABLE1 has a match in TABLE2. In other words, I
have
no
> > >> >> > choice but to do full table scans. I already looked into
business
> > >> >> > process, and there is no time or money to change it at this
point.
> > >> >> >
> > >> >> > So, here's the query I need to run:
> > >> >> >
> > >> >> > CREATE TABLE TEMP_TABLE PARALLEL (DEGREE 8 INSTANCES 1)
NOLOGGING
AS
> > >> >> > SELECT /*+ ORDERED FULL(A) FULL(B) USE_HASH(B) */
> > >> >> > A.GROUP_CODE, B.NAME, SUM(B.AMOUNT), SUM(B.COUNT)
> > >> >> > FROM TABLE2 A, TABLE1 B
> > >> >> > WHERE A.ID = B.ID
> > >> >> > GROUP BY A.GROUP_CODE, B.NAME;
> > >> >> >
> > >> >> > This query finishes in about 30 hours when the machine is
pretty
much
> > >> >> > dedicated to my process (needless to say, it's too long for
just
the
> > >> >> > first step). I am looking at PL/SQL and bulk DML, but I am
not
sure
yet
> > >> >> > how I could apply it in my case. Maybe, there are ways
besides
bulk
DML
> > >> >> > that I could try. If you have any suggestions/thoughts etc.,
please
let
> > >> >> > me know.
> > >> >> >
> > >> >> > I have Oracle 8.0.6.0.0 With Partitioning and Objects
Options;
> > >> >> > SunOS 5.7 on 8 processors (400 Mz each), 80 Gig Memory, RAID
5
(66
> > >> >> > disks in total I think, but not sure).
> > >> >> >
> > >> >> > Here's some db settings:
> > >> >> >
> > >> >> > parallel_transaction_resource_timeout 300
> > >> >> > shared_pool_size 209715200
> > >> >> > shared_pool_reserved_size 37847040
> > >> >> > shared_pool_reserved_min_alloc 5K
> > >> >> > large_pool_size 0
> > >> >> > large_pool_min_alloc 16K
> > >> >> > disk_asynch_io TRUE
> > >> >> > dbwr_io_slaves 0
> > >> >> > lgwr_io_slaves 4
> > >> >> > arch_io_slaves 0
> > >> >> > db_file_direct_io_count 64
> > >> >> > db_block_buffers 48000
> > >> >> > db_block_checksum FALSE
> > >> >> > db_block_size 16384
> > >> >> > db_block_checkpoint_batch 8
> > >> >> > db_block_lru_statistics FALSE
> > >> >> > db_block_lru_extended_statistics 0
> > >> >> > db_writer_processes 4
> > >> >> > db_block_lru_latches 8
> > >> >> > db_block_max_dirty_target 4294967294
> > >> >> > db_file_simultaneous_writes 4
> > >> >> > db_file_multiblock_read_count 64
> > >> >> > mts_servers 1
> > >> >> > mts_max_servers 10
> > >> >> > mts_max_dispatchers 10
> > >> >> > local_listener
> > >> >> > mts_listener_address (ADDRESS=(PROTOCOL=ipc)(KEY=PNPKEY))
> > >> >> > mts_multiple_listeners FALSE
> > >> >> > sort_area_size 10485760
> > >> >> > sort_area_retained_size 65536
> > >> >> > sort_direct_writes TRUE
> > >> >> > sort_write_buffers 4
> > >> >> > sort_write_buffer_size 32768
> > >> >> > sort_spacemap_size 512
> > >> >> > sort_read_fac 5
> > >> >> > parallel_min_percent 0
> > >> >> > parallel_default_max_instances 0
> > >> >> > parallel_min_servers 7
> > >> >> > parallel_max_servers 32
> > >> >> > parallel_server_idle_time 3
> > >> >> > parallel_execution_message_size 2148
> > >> >> > parallel_min_message_pool 225540
> > >> >> > hash_join_enabled TRUE
> > >> >> > hash_area_size 13107200
> > >> >> > hash_multiblock_io_count 4
> > >> >> >
> > >> >> > Thank you very much!
> > >> >> > Sergey
> > >> >> >
> > >> >> > Sent via Deja.com http://www.deja.com/
> > >> >> > Before you buy.
> > >> >> >
> > >> >>
> > >> >> Sent via Deja.com http://www.deja.com/
> > >> >> Before you buy.
> > >> >>
> > >> >
> > >> >
> > >> >Sent via Deja.com http://www.deja.com/
> > >> >Before you buy.
> > >>
> > >>
> > >
> > >
> > >Sent via Deja.com http://www.deja.com/
> > >Before you buy.
> >
> >
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.
>

Sent via Deja.com http://www.deja.com/
Before you buy. Received on Sat Nov 11 2000 - 08:20:49 CST