RE: Cost of hash join

From: Christian Antognini <Christian.Antognini_at_trivadis.com>
Date: Thu, 9 Jun 2005 10:39:37 +0200
Message-ID: <7F0C000A3ABA6241A10C9ABF37EEB46D0F0165@MSXVS01.trivadis.com>

Hi Jared

Some observations show that the cost is not proportional with the number of partitions.

Below you find a simple test. Of course any comment is welcome!!!

HTH Chris

The setup.

CREATE TABLE t1 (id NUMBER, v1 CHAR(1000));

CREATE TABLE t2 (id NUMBER, v1 CHAR(1000));

INSERT INTO t1 SELECT rownum, object_name FROM all_objects WHERE rownum <= 10;

COMMIT; INSERT INTO t2 SELECT rownum, object_name FROM all_objects WHERE rownum <= 10000;

INSERT INTO t2 SELECT rownum, v1 FROM t2;

COMMIT; exec dbms_stats.gather_table_stats(user,'t1')

exec dbms_stats.gather_table_stats(user,'t2')

2) Two test statements.

SELECT t1.v1 FROM t1, t2 WHERE t1.id = t2.id;

SELECT t2.v1 FROM t1, t2 WHERE t1.id = t2.id;

3) The execution plan reported by AUTOTRACE. Notice the difference between the two hash joins.

SQL> SELECT t1.v1 FROM t1, t2 WHERE t1.id = t2.id;

Execution Plan

SELECT STATEMENT Optimizer=CHOOSE (Cost=1105 Card=20 Bytes=20180)

HASH JOIN (Cost=1105 Card=20 Bytes=20180)

TABLE ACCESS (FULL) OF 'T1' (Cost=2 Card=10 Bytes=10040)

TABLE ACCESS (FULL) OF 'T2' (Cost=1101 Card=80000 Bytes=400000)

SQL> SELECT t2.v1 FROM t1, t2 WHERE t1.id = t2.id;

Execution Plan

SELECT STATEMENT Optimizer=CHOOSE (Cost=1139 Card=20 Bytes=20160)

HASH JOIN (Cost=1139 Card=20 Bytes=20160)

TABLE ACCESS (FULL) OF 'T1' (Cost=2 Card=10 Bytes=30)

TABLE ACCESS (FULL) OF 'T2' (Cost=1101 Card=80000 Bytes=80400000)

4) What does event 10053 say? The cost for the join is different: 2 for the first one, 36 for the second one. The different comes, IMO, from the size of the inner table set (rcz*cdn): 5*80000=400000 for the first one, 1005*80000=80400000 for the second one.

Outer table:

resc: 2 cdn: 10 rcz: 1004 deg: 1 resp: 2

Inner table: T2

resc: 1101 cdn: 80000 rcz: 5 deg: 1 resp: 1101

using join:8 distribution:2 #groups:1

Hash join one ptn Resc: 2 Deg: 1

hash_area: 128 (max=128) buildfrag: 129 probefrag: 167 ppasses: 2

Hash join Resc: 1105 Resp: 1105

Outer table:

resc: 2 cdn: 10 rcz: 3 deg: 1 resp: 2

Inner table: T2

resc: 1101 cdn: 80000 rcz: 1005 deg: 1 resp: 1101

using join:8 distribution:2 #groups:1

Hash join one ptn Resc: 36 Deg: 1

hash_area: 128 (max=128) buildfrag: 129 probefrag: 9932 ppasses: 2

Hash join Resc: 1139 Resp: 1139

5) What does event 10104 say? The number of partitions is 8 in both cases. Therefore this cannot lead to a different cost! What is important, IMO, is the size and the number of slots that can be placed in memory. For the first statement there's enough place for the whole set, therefore the cost of the hash join is very low (2). In the second case we need about 116 slots but only 17 are available. Therefore lot of them will be written to disk, this lead to the cost of 36.

Original memory: 1048576

Memory after all overhead: 1024307

Memory for slots: 983040

Calculated overhead for partitions and row/slot managers: 41267

Hash-join fanout: 8

Number of partitions: 8

Number of slots: 15

Multiblock IO: 8

Block size(KB): 8

Cluster (slot) size(KB): 64

Original memory: 1048576

Memory after all overhead: 1002495

Memory for slots: 696320

Calculated overhead for partitions and row/slot managers: 306175

Hash-join fanout: 8

Number of partitions: 8

Number of slots: 17

Multiblock IO: 5

Block size(KB): 8

Cluster (slot) size(KB): 40

>-----Original Message-----

>From: Jared Still [mailto:jkstill_at_gmail.com]

>Sent: 08 June 2005 19:34

>To: Christian Antognini

>Subject: Re: Cost of hash join

>Hi Chris,

>Are you sure about that?

>The formula presented in the manual would seem correct.

>That is, the cost is that of scanning A, scanning B, and the cost

>of joining each partition of B (which may be just 1) to A.

>Jared

>On 6/7/05, Christian Antognini <Christian.Antognini_at_trivadis.com> wrote:

>Hi Naveen

>>cost= (access cost of A * number of hash partitions of

>>B) + access cost of B.

>>Why is the access cost of A multiplied by no. of hash

>>partitions?

>In my opinion this is not a good one (perhaps for this reason it's no more in the 10g

>documentation...).

>In fact the cost is something like:

>cost = access cost A + access cost B + cost join.

>If the hash table is "small", the cost of the join itself will be close to 0.

>If the hash table is "large", the cost is strongly dependent on the amount of "available" PGA

>memory.

--
http://www.freelists.org/webpage/oracle-l

Received on Thu Jun 09 2005 - 04:44:49 CDT