Re: Bulk DML question ...

From: <sergey_s_at_my-deja.com>
Date: Thu, 09 Nov 2000 17:04:47 GMT
Message-ID: <8uelf8$2hr$1@nnrp1.deja.com>

> sum(b.count) sum_count -- how did you get
>COUNT
> as a column name ?

That's my mistake, sorry. The column name is CNT. I will try recoding the SQL the way you suggested and see what happens. Will post results back to NG.

Thank you!
Sergey

In article <973628946.28797.3.nnrp-12.9e984b29_at_news.demon.co.uk>, "Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote:
>
> I should also have asked:
> If two rows have the same name
> do they have the same ID.
>
> Still, assuming they do, this should
> give you the same result, and may
> be quicker because the sort/group
> is on an narrower set, and the join
> is between fewer rows:
>
> (e and o.e. in the text)
>
> CREATE TABLE TEMP_TABLE PARALLEL (DEGREE 8 INSTANCES 1) NOLOGGING
> AS
> select
> a.group_code,
> il.name
> il.sum_amount,
> il.sum_count
> (
> select /*+ no_merge */
> b.id,
> b.name,
> sum(b.amount) sum_amount,
> sum(b.count) sum_count -- how did you get
COUNT
> as a column name ?
> from
> table1 b
> group by b.id, b.name
> ) il,
> table2 a
> where
> a.id = il.id
> ;
>
> --
>
> Jonathan Lewis
> Yet another Oracle-related web site: http://www.jlcomp.demon.co.uk
>
> Practical Oracle 8i: Building Efficient Databases
> Publishers: Addison Wesley Longman
> Book bound date now 8th Dec 2000
>
> sergey_s_at_my-deja.com wrote in message <8u7u1j$igr$1_at_nnrp1.deja.com>...
> >
> >> A coupld of quick question for this query though, is ID
> >> unique on the smaller table ?
> >
> >Yes, it is.
> >
> >>Is the drop in row count
> >> largely due to repeated group_codes or repeated names ?
> >
> >Due to repeated names.
> >
> >
> >Thank you!
> >Sergey
> >
> >
> >In article <973373984.28182.0.nnrp-02.9e984b29_at_news.demon.co.uk>,
> > "Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote:
> >>
> >> Aggregating 3Billion rows down to 1.2B rows
> >> is bound to be very slow. It is possible that
> >> your strategy of breaking the aggregation into
> >> steps needs to be reviewed so that you optimize
> >> a step which has a higher density of aggregation.
> >>
> >> A coupld of quick question for this query though, is ID
> >> unique on the smaller table ? Is the drop in row count
> >> largely due to repeated group_codes or repeated names ?
> >>
> >> --
> >>
> >> Jonathan Lewis
> >> Yet another Oracle-related web site: http://www.jlcomp.demon.co.uk
> >>
> >> Practical Oracle 8i: Building Efficient Databases
> >> Publishers: Addison Wesley Longman
> >> Book bound date now 1st Dec 2000
> >>
> >> sergey_s_at_my-deja.com wrote in message <8tv6in$ta4
$1_at_nnrp1.deja.com>...
> >> >Yes, the explain plan shows hash join and every step is
> >> >parallel-to-parallel (except for the collection at the end, of
course
-
> >> >that one's parallel-to-serial). The query creates a "processing
table"
> >> >(approx. 1.2 billion rows less than the base table). This
processing
> >> >table is later queried to create more temporary tables which in
turn
are
> >> >used to create the final aggregate tables. Basically, the
aggregation
> >> >process was broken down into smaller steps for performance reasons
(we
> >> >achieved 30% reduction in time).
> >> >
> >> >Thank you!
> >> >Sergey
> >> >
> >> >
> >> >In article <8tuua4$lf8$1_at_nnrp1.deja.com>,
> >> > jdarrah_co_at_my-deja.com wrote:
> >> >> Does the explain plan show that its using hash and that the
statement
> >> >> is running in parrallel? Also, what's the next step after you
create
> >> >> the table?
> >> >>
> >> >> In article <8tqg61$15b$1_at_nnrp1.deja.com>,
> >> >> sergey_s_at_my-deja.com wrote:
> >> >> > Let's say I have two tables:
> >> >> >
> >> >> > CREATE TABLE1
> >> >> > (
> >> >> > TIME_KEY NUMBER,
> >> >> > ID NUMBER,
> >> >> > NAME VARCHAR2(20),
> >> >> > AMOUNT NUMBER(8,2),
> >> >> > COUNT NUMBER,
> >> >> > ...
> >> >> > );
> >> >> >
> >> >> > CREATE TABLE2
> >> >> > (
> >> >> > ID NUMBER,
> >> >> > GROUP_CODE NUMBER,
> >> >> > ...
> >> >> > );
> >> >> >
> >> >> > TABLE1 has approx. 3 billion rows (almost 200 Gig.). TABLE2
has
approx.
> >> >> > 3 million rows. TABLE1 is partitioned by TIME_KEY (partitions
have
> >> >> > anywhere from 50 to 100 million rows each). I looked into
making
> >> >> > partitions smaller based on additional criteria, but the next
candidate
> >> >> > column has only about 500 unique values.
> >> >> >
> >> >> > Every row in TABLE1 has a match in TABLE2. In other words, I
have
no
> >> >> > choice but to do full table scans. I already looked into
business
> >> >> > process, and there is no time or money to change it at this
point.
> >> >> >
> >> >> > So, here's the query I need to run:
> >> >> >
> >> >> > CREATE TABLE TEMP_TABLE PARALLEL (DEGREE 8 INSTANCES 1)
NOLOGGING
AS
> >> >> > SELECT /*+ ORDERED FULL(A) FULL(B) USE_HASH(B) */
> >> >> > A.GROUP_CODE, B.NAME, SUM(B.AMOUNT), SUM(B.COUNT)
> >> >> > FROM TABLE2 A, TABLE1 B
> >> >> > WHERE A.ID = B.ID
> >> >> > GROUP BY A.GROUP_CODE, B.NAME;
> >> >> >
> >> >> > This query finishes in about 30 hours when the machine is
pretty
much
> >> >> > dedicated to my process (needless to say, it's too long for
just
the
> >> >> > first step). I am looking at PL/SQL and bulk DML, but I am not
sure
yet
> >> >> > how I could apply it in my case. Maybe, there are ways besides
bulk
DML
> >> >> > that I could try. If you have any suggestions/thoughts etc.,
please
let
> >> >> > me know.
> >> >> >
> >> >> > I have Oracle 8.0.6.0.0 With Partitioning and Objects Options;
> >> >> > SunOS 5.7 on 8 processors (400 Mz each), 80 Gig Memory, RAID 5
(66
> >> >> > disks in total I think, but not sure).
> >> >> >
> >> >> > Here's some db settings:
> >> >> >
> >> >> > parallel_transaction_resource_timeout 300
> >> >> > shared_pool_size 209715200
> >> >> > shared_pool_reserved_size 37847040
> >> >> > shared_pool_reserved_min_alloc 5K
> >> >> > large_pool_size 0
> >> >> > large_pool_min_alloc 16K
> >> >> > disk_asynch_io TRUE
> >> >> > dbwr_io_slaves 0
> >> >> > lgwr_io_slaves 4
> >> >> > arch_io_slaves 0
> >> >> > db_file_direct_io_count 64
> >> >> > db_block_buffers 48000
> >> >> > db_block_checksum FALSE
> >> >> > db_block_size 16384
> >> >> > db_block_checkpoint_batch 8
> >> >> > db_block_lru_statistics FALSE
> >> >> > db_block_lru_extended_statistics 0
> >> >> > db_writer_processes 4
> >> >> > db_block_lru_latches 8
> >> >> > db_block_max_dirty_target 4294967294
> >> >> > db_file_simultaneous_writes 4
> >> >> > db_file_multiblock_read_count 64
> >> >> > mts_servers 1
> >> >> > mts_max_servers 10
> >> >> > mts_max_dispatchers 10
> >> >> > local_listener
> >> >> > mts_listener_address (ADDRESS=(PROTOCOL=ipc)(KEY=PNPKEY))
> >> >> > mts_multiple_listeners FALSE
> >> >> > sort_area_size 10485760
> >> >> > sort_area_retained_size 65536
> >> >> > sort_direct_writes TRUE
> >> >> > sort_write_buffers 4
> >> >> > sort_write_buffer_size 32768
> >> >> > sort_spacemap_size 512
> >> >> > sort_read_fac 5
> >> >> > parallel_min_percent 0
> >> >> > parallel_default_max_instances 0
> >> >> > parallel_min_servers 7
> >> >> > parallel_max_servers 32
> >> >> > parallel_server_idle_time 3
> >> >> > parallel_execution_message_size 2148
> >> >> > parallel_min_message_pool 225540
> >> >> > hash_join_enabled TRUE
> >> >> > hash_area_size 13107200
> >> >> > hash_multiblock_io_count 4
> >> >> >
> >> >> > Thank you very much!
> >> >> > Sergey
> >> >> >
> >> >> > Sent via Deja.com http://www.deja.com/
> >> >> > Before you buy.
> >> >> >
> >> >>
> >> >> Sent via Deja.com http://www.deja.com/
> >> >> Before you buy.
> >> >>
> >> >
> >> >
> >> >Sent via Deja.com http://www.deja.com/
> >> >Before you buy.
> >>
> >>
> >
> >
> >Sent via Deja.com http://www.deja.com/
> >Before you buy.
>
>

Sent via Deja.com http://www.deja.com/
Before you buy. Received on Thu Nov 09 2000 - 11:04:47 CST