Re: Bulk DML question ...

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Tue, 7 Nov 2000 21:31:27 -0000
Message-ID: <973628946.28797.3.nnrp-12.9e984b29@news.demon.co.uk>

I should also have asked:

If two rows have the same name
do they have the same ID.

Still, assuming they do, this should
give you the same result, and may
be quicker because the sort/group
is on an narrower set, and the join
is between fewer rows:

(e and o.e. in the text)

CREATE TABLE TEMP_TABLE PARALLEL (DEGREE 8 INSTANCES 1) NOLOGGING AS
select

a.group_code,

    il.name
    il.sum_amount,
    il.sum_count

(
select /*+ no_merge */

        b.id,
        b.name,
        sum(b.amount)    sum_amount,
        sum(b.count)        sum_count            --  how did you get COUNT

as a column name ?

from

        table1 b
    group by b.id, b.name
    ) il,
    table2 a
where

a.id = il.id
;

--

Jonathan Lewis
Yet another Oracle-related web site:  http://www.jlcomp.demon.co.uk

Practical Oracle 8i:  Building Efficient Databases
Publishers:  Addison Wesley Longman
Book bound date now 8th Dec 2000

sergey_s_at_my-deja.com wrote in message <8u7u1j$igr$1_at_nnrp1.deja.com>...


>

>> A coupld of quick question for this query though, is ID

>> unique on the smaller table ?

>

>Yes, it is.

>

>>Is the drop in row count

>> largely due to repeated group_codes or repeated names ?

>

>Due to repeated names.

>

>

>Thank you!

>Sergey

>

>

>In article <973373984.28182.0.nnrp-02.9e984b29_at_news.demon.co.uk>,

>  "Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote:

>>

>> Aggregating 3Billion rows down to 1.2B rows

>> is bound to be very slow.  It is possible that

>> your strategy of breaking the aggregation into

>> steps needs to be reviewed so that you optimize

>> a step which has a higher density of aggregation.

>>

>> A coupld of quick question for this query though, is ID

>> unique on the smaller table ? Is the drop in row count

>> largely due to repeated group_codes or repeated names ?

>>

>> --

>>

>> Jonathan Lewis

>> Yet another Oracle-related web site:  http://www.jlcomp.demon.co.uk

>>

>> Practical Oracle 8i:  Building Efficient Databases

>> Publishers:  Addison Wesley Longman

>> Book bound date now 1st Dec 2000

>>

>> sergey_s_at_my-deja.com wrote in message <8tv6in$ta4$1_at_nnrp1.deja.com>...

>> >Yes, the explain plan shows hash join and every step is

>> >parallel-to-parallel (except for the collection at the end, of course

 -


>> >that one's parallel-to-serial). The query creates a "processing

 table"


>> >(approx. 1.2 billion rows less than the base table). This processing

>> >table is later queried to create more temporary tables which in turn

 are


>> >used to create the final aggregate tables. Basically, the aggregation

>> >process was broken down into smaller steps for performance reasons

 (we


>> >achieved 30% reduction in time).

>> >

>> >Thank you!

>> >Sergey

>> >

>> >

>> >In article <8tuua4$lf8$1_at_nnrp1.deja.com>,

>> >  jdarrah_co_at_my-deja.com wrote:

>> >> Does the explain plan show that its using hash and that the

 statement


>> >> is running in parrallel?  Also, what's the next step after you

 create


>> >> the table?

>> >>

>> >> In article <8tqg61$15b$1_at_nnrp1.deja.com>,

>> >>   sergey_s_at_my-deja.com wrote:

>> >> > Let's say I have two tables:

>> >> >

>> >> > CREATE TABLE1

>> >> > (

>> >> >   TIME_KEY NUMBER,

>> >> >   ID NUMBER,

>> >> >   NAME VARCHAR2(20),

>> >> >   AMOUNT NUMBER(8,2),

>> >> >   COUNT NUMBER,

>> >> >   ...

>> >> > );

>> >> >

>> >> > CREATE TABLE2

>> >> > (

>> >> >   ID NUMBER,

>> >> >   GROUP_CODE NUMBER,

>> >> >   ...

>> >> > );

>> >> >

>> >> > TABLE1 has approx. 3 billion rows (almost 200 Gig.). TABLE2 has

 approx.


>> >> > 3 million rows. TABLE1 is partitioned by TIME_KEY (partitions

 have


>> >> > anywhere from 50 to 100 million rows each). I looked into making

>> >> > partitions smaller based on additional criteria, but the next

 candidate


>> >> > column has only about 500 unique values.

>> >> >

>> >> > Every row in TABLE1 has a match in TABLE2. In other words, I have

 no


>> >> > choice but to do full table scans. I already looked into business

>> >> > process, and there is no time or money to change it at this

 point.


>> >> >

>> >> > So, here's the query I need to run:

>> >> >

>> >> > CREATE TABLE TEMP_TABLE PARALLEL (DEGREE 8 INSTANCES 1) NOLOGGING

 AS


>> >> > SELECT /*+ ORDERED FULL(A) FULL(B) USE_HASH(B) */

>> >> > A.GROUP_CODE, B.NAME, SUM(B.AMOUNT), SUM(B.COUNT)

>> >> > FROM TABLE2 A, TABLE1 B

>> >> > WHERE A.ID = B.ID

>> >> > GROUP BY A.GROUP_CODE, B.NAME;

>> >> >

>> >> > This query finishes in about 30 hours when the machine is pretty

 much


>> >> > dedicated to my process (needless to say, it's too long for just

 the


>> >> > first step). I am looking at PL/SQL and bulk DML, but I am not

 sure
 yet


>> >> > how I could apply it in my case. Maybe, there are ways besides

 bulk
 DML


>> >> > that I could try. If you have any suggestions/thoughts etc.,

 please
 let


>> >> > me know.

>> >> >

>> >> > I have Oracle 8.0.6.0.0 With Partitioning and Objects Options;

>> >> > SunOS 5.7 on 8 processors (400 Mz each), 80 Gig Memory, RAID 5

 (66


>> >> > disks in total I think, but not sure).

>> >> >

>> >> > Here's some db settings:

>> >> >

>> >> > parallel_transaction_resource_timeout  300

>> >> > shared_pool_size  209715200

>> >> > shared_pool_reserved_size  37847040

>> >> > shared_pool_reserved_min_alloc  5K

>> >> > large_pool_size  0

>> >> > large_pool_min_alloc  16K

>> >> > disk_asynch_io  TRUE

>> >> > dbwr_io_slaves  0

>> >> > lgwr_io_slaves  4

>> >> > arch_io_slaves  0

>> >> > db_file_direct_io_count  64

>> >> > db_block_buffers  48000

>> >> > db_block_checksum  FALSE

>> >> > db_block_size  16384

>> >> > db_block_checkpoint_batch  8

>> >> > db_block_lru_statistics  FALSE

>> >> > db_block_lru_extended_statistics  0

>> >> > db_writer_processes  4

>> >> > db_block_lru_latches  8

>> >> > db_block_max_dirty_target  4294967294

>> >> > db_file_simultaneous_writes  4

>> >> > db_file_multiblock_read_count  64

>> >> > mts_servers  1

>> >> > mts_max_servers  10

>> >> > mts_max_dispatchers  10

>> >> > local_listener

>> >> > mts_listener_address  (ADDRESS=(PROTOCOL=ipc)(KEY=PNPKEY))

>> >> > mts_multiple_listeners  FALSE

>> >> > sort_area_size  10485760

>> >> > sort_area_retained_size  65536

>> >> > sort_direct_writes  TRUE

>> >> > sort_write_buffers  4

>> >> > sort_write_buffer_size  32768

>> >> > sort_spacemap_size  512

>> >> > sort_read_fac  5

>> >> > parallel_min_percent  0

>> >> > parallel_default_max_instances  0

>> >> > parallel_min_servers  7

>> >> > parallel_max_servers  32

>> >> > parallel_server_idle_time  3

>> >> > parallel_execution_message_size  2148

>> >> > parallel_min_message_pool  225540

>> >> > hash_join_enabled  TRUE

>> >> > hash_area_size  13107200

>> >> > hash_multiblock_io_count  4

>> >> >

>> >> > Thank you very much!

>> >> > Sergey

>> >> >

>> >> > Sent via Deja.com http://www.deja.com/

>> >> > Before you buy.

>> >> >

>> >>

>> >> Sent via Deja.com http://www.deja.com/

>> >> Before you buy.

>> >>

>> >

>> >

>> >Sent via Deja.com http://www.deja.com/

>> >Before you buy.

>>

>>

>

>

>Sent via Deja.com http://www.deja.com/

>Before you buy.

Received on Tue Nov 07 2000 - 15:31:27 CST