Re: Question about hash jion

From: Franck Pachot <franck_at_pachot.net>
Date: Wed, 20 Mar 2019 09:27:57 +0100
Message-ID: <CAK6ito1k+KSBnm2eh1Nu_yhAoP0khX-OA8-h2h_CD31Mmaw0zg_at_mail.gmail.com>



Hi Orlando,
You can verify by fetching only few rows, canceling the query, and looking at execution statistics with dbms_xplan.

Example on SQL Developer which fetches only the first 5000 rows (but you can also do that with sqlplus pause and ^c):

SQL> set echo on pagesize 1000
SQL> create table T1(n) as select rownum from xmltable('1 to 10000');

Table T1 created.

SQL> create table T2(n) as select rownum from xmltable('1 to 10000');

Table T2 created.

SQL> create table T3(n) as select rownum from xmltable('1 to 10000');

Table T3 created.

SQL> select /*+ leading(T1 T2 T3) use_hash(T2) use_hash(T3) no_swap_join_inputs(T2) swap_join_inputs(T3) gather_plan_statistics */ *   2 from T1 inner join T2 using(n) inner join T3 using(n);

         N


         1
         2
         3

...

Only 5,000 rows currently supported in a script results *5,000 rows selected*.

SQL> select * from dbms_xplan.display_cursor(format=>'+alias +hint_report allstats');

PLAN_TABLE_OUTPUT



SQL_ID 4ffqukv0g0r0s, child number 0

select /*+ leading(T1 T2 T3) use_hash(T2) use_hash(T3) no_swap_join_inputs(T2) swap_join_inputs(T3) gather_plan_statistics */ * from T1 inner join T2 using(n) inner join T3 using(n)

Plan hash value: 1573120526



| Id | Operation | Name | Starts | E-Rows | *A-Rows* | A-Time  | Buffers | OMem | 1Mem | Used-Mem |
|   0 | SELECT STATEMENT    |      |      1 |        |   5050 |00:00:00.01
|      61 |       |       |          |
|*  1 |  HASH JOIN          |      |      1 |  10000 |   5050 |00:00:00.01
|      61 |  2546K|  2546K| 2012K (0)|
|   2 |   TABLE ACCESS FULL | T3   |      1 |  10000 |  10000 |00:00:00.01
|      19 |       |       |          |
|*  3 |   HASH JOIN         |      |      1 |  10000 |   5050 |00:00:00.01
|      41 |  2546K|  2546K| 2184K (0)|
|   4 |    TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 |00:00:00.01
|      19 |       |       |          |
|   5 |    TABLE ACCESS FULL| T2   |      1 |  10000 |   *5050 *|00:00:00.01
|      21 |       |       |          |

-----------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):


   1 - SEL$9E43CB6E
   2 - SEL$9E43CB6E / T3_at_SEL$2
   4 - SEL$9E43CB6E / T1_at_SEL$1
   5 - SEL$9E43CB6E / T2_at_SEL$1

Predicate Information (identified by operation id):


   1 - access("T2"."N"="T3"."N")
   3 - access("T1"."N"="T2"."N")

Hint Report (identified by operation id / Query Block Name / Object Alias): Total hints for statement: 5


   1 - SEL$9E43CB6E

  • leading(T1 T2 T3)

   2 - SEL$9E43CB6E / T3_at_SEL$2

  • swap_join_inputs(T3)
  • use_hash(T3)

   5 - SEL$9E43CB6E / T2_at_SEL$1

  • no_swap_join_inputs(T2)
  • use_hash(T2)

Only 5050 rows have been read from T2 - the 5000 I fetched and the next 50 which is the fetch size.

Try the same after 'alter session force parallel query parallel 2' you will see a different behavior as the join to T3 is a HASH JOIN BUFFERED.

Regards,
Franck.

On Wed, Mar 20, 2019 at 3:53 AM Sayan Malakshinov <xt.and.r_at_gmail.com> wrote:

> Orlando,
>
> > If it is case 1, I would think the more hash joins there are in the
> plan, the more the PGA requirements will be to hash multiple (big) tables
> at the same time if it all hash joins nested to multiple levels.
>
> It's hard to say more or less, because in case 2 you would have to store
> whole resultsets of hash joins, which can be much bigger than probe tables.
>
> > Is there a way to prove the answer, say by trace data or something?
>
> I forgot to mention that you can also monitor it using
> v$sql_workarea_active.
>
>
> ср, 20 мар. 2019 г., 5:33 Sayan Malakshinov <xt.and.r_at_gmail.com>:
>
>> Hi Orlando,
>>
>> >  1) can I assume that as soon as oracle finds a matching row between
>> t1 and t2, oracle *immediately* compares that matching row to t3?
>> Yes, of course. You can read Jonathan's articles about how to read
>> execution plans. He suggests “first child first, recursive descent”
>> mnemonic rule for interpreting plans. In short and with some simplification
>> we can say that each plan operation is a function which starts and stops
>> own descendent plan operations(from top to bottom), processes their output
>> and pipes own results to parent operation.
>>
>> > Is there a way to prove the answer, say by trace data or something?
>>
>> Of course, IMHO the easiest way to do this is to use real-time sql
>> monitor in "active" mode, but you can also analyze raw sql trace 10046.
>> BTW, Franck Pachot has a very good demonstration for that: he uses gdb
>> breakpoints and shows each step.
>>
>> ср, 20 мар. 2019 г., 2:37 Orlando L <oralrnr_at_gmail.com>:
>>
>>>
>>> I am reading the book on CBO fundamentals by Jonathan and I have a
>>> question on hash join. He mentions (P289) that in the following hash join
>>> scenario, the first join is T1->T2, with T1 the probed table, and if that
>>> probe is successful, Oracle probes t3 (already hashed) for a match. Since
>>> t3 and t1 are hashed in memory, 1) can I assume that as soon as oracle
>>> finds a matching row between t1 and t2, oracle *immediately* compares that
>>> matching row to t3? OR 2) does it wait to build a list of matching (t1,t2)
>>> rows and THEN compare each row in that list to t3?
>>>
>>> I have been thinking all along that it was case 2. If it is case 1, I
>>> would think the more hash joins there are in the plan, the more the PGA
>>> requirements will be to hash multiple (big) tables at the same time if it
>>> all hash joins nested to multiple levels.
>>>
>>> Execution PlanS (10.1.0.3 autotrace)
>>> ----------------------------------------------------------
>>>    0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=212 Card=9551
>>> Bytes=573060)
>>>    1    0   HASH JOIN (Cost=212 Card=9551 Bytes=573060)
>>>    2    1     TABLE ACCESS (FULL) OF 'T3' (TABLE) (Cost=53 Card=10000
>>> Bytes=200000)
>>>    3    1     HASH JOIN (Cost=120 Card=62500 Bytes=2500000)
>>>    4    3       TABLE ACCESS (FULL) OF 'T1' (TABLE) (Cost=54 Card=10000
>>> Bytes=200000)
>>>    5    3       TABLE ACCESS (FULL) OF 'T2' (TABLE) (Cost=53 Card=10000
>>> Bytes=200000)
>>>
>>>
>>> Is there a way to prove the answer, say by trace data or something?
>>>
>>> PS. I can finally appreciate what a great book CBOF is. Hope he
>>> publishes a new book for the newer versions of Oracle.
>>>
>>> Orlando.
>>>
>>>
>>>

--

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Mar 20 2019 - 09:27:57 CET

Original text of this message