Re: Hint for self-join connect by

From: Alberto Dell'Era <alberto.dellera_at_gmail.com>
Date: Tue, 1 May 2007 23:13:30 +0200
Message-ID: <4ef2fbf50705011413n13f9c3c0r4798a9e425bc5f5f@mail.gmail.com>

Amit,

I'm researching as well, the algorithm is not 100% clear to me, and I have yet to RTFM in full and look around for "prior art" :)

I have created as well a minitable this afternoon, and I have an interesting variant to share (9.2.0.6):

insert into dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID) values ('_MERGED_',2,1);
insert into dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID) values ('',3,2);
insert into dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID) values ('',4,3);
insert into dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID) values ('',5,4);
insert into dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID) values ('_MERGED_',6,99);
insert into dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID) values ('',7,6);
insert into dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID) values ('',8,7);
insert into dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID) values ('',9,8);
insert into dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID) values ('',10,9);
insert into dch_work_surr_mgmt_ordered(ACCT_ID,FORWARD_SURROGATE_ID,SURROGATE_ID) values ('',11,10);

using your DDL:

create index test_index on dch_work_surr_mgmt_ordered (acct_id);
create index test_index2 on dch_work_surr_mgmt_ordered (surrogate_id,
forward_surrogate_id);

exec dbms_stats.gather_table_stats (user, 'dch_work_surr_mgmt_ordered', cascade=>true, method_opt=>'for all columns size 1', estimate_percent=>100);

explain plan for
SELECT SUBSTR(sys_connect_by_path(surrogate_id, '|'), 2) tree

                        , LEVEL AS lev
                          FROM dch_work_surr_mgmt_ordered
                         START WITH acct_id = '_MERGED_'
                       CONNECT BY PRIOR forward_surrogate_id = surrogate_id;

select * from table (dbms_xplan.display);

| Id  | Operation                     |  Name                       |

Rows | Bytes | Cost |

| 0 | SELECT STATEMENT | |

   1 |     8 |     2 |
|*  1 |  CONNECT BY WITH FILTERING    |                             |
     |       |       |


|   2 |   NESTED LOOPS                |                             |

     |       |       |
|*  3 |    INDEX RANGE SCAN           | TEST_INDEX                  |
   2 |     6 |     1 |


|   4 |    TABLE ACCESS BY USER ROWID | DCH_WORK_SURR_MGMT_ORDERED  |

     |       |       |


|   5 |   NESTED LOOPS                |                             |

     |       |       |


|   6 |    BUFFER SORT                |                             |

   1 |     8 |       |


|   7 |     CONNECT BY PUMP           |                             |

     |       |       |


|   8 |    TABLE ACCESS BY INDEX ROWID| DCH_WORK_SURR_MGMT_ORDERED  |

   1 |     8 |     2 |
|*  9 |     INDEX RANGE SCAN          | TEST_INDEX2                 |
   1 |       |     1 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

   1 - filter("DCH_WORK_SURR_MGMT_ORDERED"."ACCT_ID"='_MERGED_')
   3 - access("DCH_WORK_SURR_MGMT_ORDERED"."ACCT_ID"='_MERGED_')
   9 - access("DCH_WORK_SURR_MGMT_ORDERED"."SURROGATE_ID"=NULL)

This shows the algorithm as I expect it to be: (a) 3+4 gets the starting rows, feed them to the row sources 5-9. (b) 7 gets the rows, 6 orders them (probably to minimize the number

of blocks to read), sends to 8+9 that retrieves the first level of the hierarchy
(c) rows retrieved are sent back to (b) until no rows found.

I cannot understand row source 8 (FTS) in your plan, why it should FTS ... unless (wild shot in the dark) it keeps in "memory" the rowids only while doing (b)+(c) and then FTS to retrieve the remaining columns.

Alberto

On 5/1/07, amit poddar <amit.poddar_at_yale.edu> wrote:

> Alberto from your reply it seems that you understand the connect by
> runtime algorithm quite well.
> Can you please clarify my confusion ?
>
> I have create the table dch_work_surr_mgmt_ordered and created two indexes
> test_index (acct_id) and test_index2(surrogate_id, forward_surrogate_id)
>
> My question is:
>
> 1. Step 2 and 3 in the explain plan are for getting the rows for the
> start with clause (first selection)
>
> which step is the hierarchy visit you mention (probably step 7) ?
>
> amit
>
> SQL> explain plan for
>   2  SELECT SUBSTR(sys_connect_by_path(surrogate_id, '|'), 2) tree
>                              , LEVEL AS lev
>                           FROM dch_work_surr_mgmt_ordered
>                          START WITH acct_id = '&ACCT_MERGE'
>                        CONNECT BY PRIOR forward_surrogate_id =
> surrogate_id  3    4    5    6
>   7  /
> Enter value for acct_merge: aa
> old   5:                          START WITH acct_id = '&ACCT_MERGE'
> new   5:                          START WITH acct_id = 'aa'
>
> Explained.
>
> SQL> @?/rdbms/admin/utlxpls.sql
>
> PLAN_TABLE_OUTPUT
> ------------------------------------------------------------------------------------------------------------------------------------------------------
> Plan hash value: 3651564746
>
> -----------------------------------------------------------------------------------------------------------
> | Id  | Operation                    | Name                       |
> Rows  | Bytes | Cost (%CPU)| Time     |
> -----------------------------------------------------------------------------------------------------------
> |   0 | SELECT STATEMENT             |                            |
> 10M|   247M|     1   (0)| 00:00:01 |
> |*  1 |  CONNECT BY WITH FILTERING   |
> |       |       |            |          |
> |   2 |   TABLE ACCESS BY INDEX ROWID| DCH_WORK_SURR_MGMT_ORDERED |
> 10M|   495M|     2   (0)| 00:00:01 |
> |*  3 |    INDEX RANGE SCAN          | TEST_INDEX                 |
> 4000K|       |     1   (0)| 00:00:01 |
> |   4 |   NESTED LOOPS               |
> |       |       |            |          |
> |   5 |    BUFFER SORT               |
> |       |       |            |          |
> |   6 |     CONNECT BY PUMP          |
> |       |       |            |          |
> |*  7 |    INDEX RANGE SCAN          | TEST_INDEX2                |
> 10M|   247M|     1   (0)| 00:00:01 |
> |   8 |   TABLE ACCESS FULL          | DCH_WORK_SURR_MGMT_ORDERED |
> 1000M|    36G| 93987 (100)| 00:05:49 |
> -----------------------------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
>    1 - access("SURROGATE_ID"=PRIOR "FORWARD_SURROGATE_ID")
>    3 - access("ACCT_ID"=TO_NUMBER('aa'))
>    7 - access("SURROGATE_ID"=PRIOR "FORWARD_SURROGATE_ID")
>
> 22 rows selected.
>
>
>
> Alberto Dell'Era wrote:
> > That doesn't seem to tally with any of your plans - neither
> > has both id 5 and 6 with an asterisk. May you please
> > check it out and repost both plans with the predicate infos ?
> >
> > It would also interesting to know num_distinct, num_null and density
> > (from dba_tab_columns) for the columns
> > acct_id, forward_surrogate_id and surrogate_id.
> >
> > 450k out of 70M - that's 0.6%. Maybe an index on
> > acct_id, forward_surrogate_id
> > may help the first selection, and another on
> > surrogate_id, forward_surrogate_id
> > *might* help the hierarchy visit (or at least turn the FTS
> > into an index FFS). Why not giving it a shot.
>
>

-- 
Alberto Dell'Era
"dulce bellum inexpertis"
--
http://www.freelists.org/webpage/oracle-l

Received on Tue May 01 2007 - 16:13:30 CDT