Home » SQL & PL/SQL » SQL & PL/SQL » connect by prior performance issue
connect by prior performance issue [message #669861] Fri, 18 May 2018 12:01 Go to next message
uman2631
Messages: 13
Registered: November 2011
Junior Member
I have a table that contains record details. A simplified version is below. Over time, a record can be adjusted, and a new record with a new record ID is created.
The "parent" record id is recorded in the REC_ID_ADJ_FROM field. An adjusted record can be adjusted again by having another new record created with a new REC_ID, and the immediate
parent's rec_id in the REC_ID_ADJ_FROM field. (See examples below).
A record that has never been adjusted has a space in the REC_ID_ADJ_FROM field.

What I need to do is, given a record ID, find its original (root) record. For clarity, the rec_ids below are all easy to read, but real record IDs are not and have no pattern.

CREATE TABLE MYSCHEMA.REC_DTL
(
  REC_ID                          VARCHAR2(12 BYTE) NOT NULL,
  REC_ID_ADJ_FROM                 VARCHAR2(12 BYTE),
  DESCRIPTION                     VARCHAR2(500)
)  
TABLESPACE MY_TABLESPACE;
 
CREATE UNIQUE INDEX ONEFACET_O.PK_REC_ID ON MYSCHEMA.REC_DTL
(REC_ID)
TABLESPACE MY_TABLESPACE;

CREATE INDEX ONEFACET_O.IX_REC_ID_FROM ON MYSCHEMA.REC_DTL
(REC_ID_ADJ_FROM, REC_ID)
TABLESPACE MY_TABLESPACE;
 
 
INSERT INTO MYSCEMA.REC_DTL VALUES ('11111', ' ');
INSERT INTO MYSCEMA.REC_DTL VALUES ('11112', '11111');
INSERT INTO MYSCEMA.REC_DTL VALUES ('11113', '11112');
INSERT INTO MYSCEMA.REC_DTL VALUES ('11114', '11113');
INSERT INTO MYSCEMA.REC_DTL VALUES ('22221', ' ');
INSERT INTO MYSCEMA.REC_DTL VALUES ('22222', '22221');
INSERT INTO MYSCEMA.REC_DTL VALUES ('22223', '22222');
INSERT INTO MYSCEMA.REC_DTL VALUES ('22224', '22223');
INSERT INTO MYSCEMA.REC_DTL VALUES ('33331', ' ');
INSERT INTO MYSCEMA.REC_DTL VALUES ('44441', ' ');
INSERT INTO MYSCEMA.REC_DTL VALUES ('44442', '44441');

In the above example, record IDs 11111, 22221, 33331 and 44441 are all original records. The parent of record 11113 is 11112, whose parent is 11111, which is the "root" record.
The parent of record 22223 is 2222 whose parent is 22221 which is the root record. (As mentioned above, in these examples, it's sequential for clarity, but in real life, they are not).

The following query was written to find the original record:

select t1.rec_id as cur_rec, connect_by_root(t1.rec_id) as orig_rec
from myschema.rec_dtl t1
where t1.rec_id = :input_rec_id
start with t1.rec_id_adj_from = ' ' /* original records have a space in adj_from */
connect by prior t1.rec_id = t1.rec_id_adj_from  /* child -> parent */
;

This query works. However, it is EXTREMELY slow. The table itself has millions of records, and executing this query just 1 time takes 8 minutes or more, which is about 7.5 minutes too long. Any suggestions for improving? We've tried making the IX_REC_ID_ADJ_FROM index be just REC_ID_ADJ_FROM, both REC_ID_ADJ_FROM+REC_ID and REC_ID+REC_ID_ADJ_FROM. Having an index is better than not having an index, but no combination of indexing is helping. We've also tried parallel hints. Small improvement, but not nearly enough.

Below is the explain plan.


-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name                    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                         |  2865M|    42G|       |    15M  (1)| 00:10:00 |
|   1 |  SORT ORDER BY                            |                         |  2865M|    42G|    64G|    15M  (1)| 00:10:00 |
|*  2 |   FILTER                                  |                         |       |       |       |            |          |
|*  3 |    CONNECT BY NO FILTERING WITH START-WITH|                         |       |       |       |            |          |
|   4 |     INDEX FAST FULL SCAN                  | IX_F1_CLAIM_ID_ADJ_FROM |    98M|  1508M|       | 98909   (1)| 00:00:04 |
-----------------------------------------------------------------------------------------------------------------------------
 

thanks for any suggestions!

Ulysses
Re: connect by prior performance issue [message #669862 is a reply to message #669861] Fri, 18 May 2018 12:24 Go to previous messageGo to next message
Michel Cadot
Messages: 65545
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Please fix your test case removing the schema names which are useless and inconsistent as well as storage clause we have not your schemas and tablespaces.

In addition, you had not feed back and thank people in your previous topics which are good reasons to NOT help you more.

Re: connect by prior performance issue [message #669863 is a reply to message #669862] Fri, 18 May 2018 13:33 Go to previous messageGo to next message
uman2631
Messages: 13
Registered: November 2011
Junior Member
I don't seem to be able to edit the original message, so I have re-posted without schema names and storage clauses.

I have a table that contains record details. A simplified version is below. Over time, a record can be adjusted, and a new record with a new record ID is created.
The "parent" record id is recorded in the REC_ID_ADJ_FROM field. An adjusted record can be adjusted again by having another new record created with a new REC_ID, and the immediate
parent's rec_id in the REC_ID_ADJ_FROM field. (See examples below).
A record that has never been adjusted has a space in the REC_ID_ADJ_FROM field.

What I need to do is, given a record ID, find its original (root) record. For clarity, the rec_ids below are all easy to read, but real record IDs are not and have no pattern.

CREATE TABLE REC_DTL
(
  REC_ID                          VARCHAR2(12 BYTE) NOT NULL,
  REC_ID_ADJ_FROM                 VARCHAR2(12 BYTE),
  DESCRIPTION                     VARCHAR2(500)
)  
;
 
CREATE UNIQUE INDEX PK_REC_ID ON REC_DTL
(REC_ID)
;

CREATE INDEX IX_REC_ID_FROM ON REC_DTL
(REC_ID_ADJ_FROM, REC_ID)
;
 
 
INSERT INTO REC_DTL VALUES ('11111', ' ');
INSERT INTO REC_DTL VALUES ('11112', '11111');
INSERT INTO REC_DTL VALUES ('11113', '11112');
INSERT INTO REC_DTL VALUES ('11114', '11113');
INSERT INTO REC_DTL VALUES ('22221', ' ');
INSERT INTO REC_DTL VALUES ('22222', '22221');
INSERT INTO REC_DTL VALUES ('22223', '22222');
INSERT INTO REC_DTL VALUES ('22224', '22223');
INSERT INTO REC_DTL VALUES ('33331', ' ');
INSERT INTO REC_DTL VALUES ('44441', ' ');
INSERT INTO REC_DTL VALUES ('44442', '44441');

In the above example, record IDs 11111, 22221, 33331 and 44441 are all original records. The parent of record 11113 is 11112, whose parent is 11111, which is the "root" record.
The parent of record 22223 is 2222 whose parent is 22221 which is the root record. (As mentioned above, in these examples, it's sequential for clarity, but in real life, they are not).

The following query was written to find the original record:

select t1.rec_id as cur_rec, connect_by_root(t1.rec_id) as orig_rec
from rec_dtl t1
where t1.rec_id = :input_rec_id
start with t1.rec_id_adj_from = ' ' /* original records have a space in adj_from */
connect by prior t1.rec_id = t1.rec_id_adj_from  /* child -> parent */
;

This query works. However, it is EXTREMELY slow. The table itself has millions of records, and executing this query just 1 time takes 8 minutes or more, which is about 7.5 minutes too long. Any suggestions for improving? We've tried making the IX_REC_ID_ADJ_FROM index be just REC_ID_ADJ_FROM, both REC_ID_ADJ_FROM+REC_ID and REC_ID+REC_ID_ADJ_FROM. Having an index is better than not having an index, but no combination of indexing is helping. We've also tried parallel hints. Small improvement, but not nearly enough.

Below is the explain plan.


-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name                    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                         |  2865M|    42G|       |    15M  (1)| 00:10:00 |
|   1 |  SORT ORDER BY                            |                         |  2865M|    42G|    64G|    15M  (1)| 00:10:00 |
|*  2 |   FILTER                                  |                         |       |       |       |            |          |
|*  3 |    CONNECT BY NO FILTERING WITH START-WITH|                         |       |       |       |            |          |
|   4 |     INDEX FAST FULL SCAN                  | IX_F1_CLAIM_ID_ADJ_FROM |    98M|  1508M|       | 98909   (1)| 00:00:04 |
-----------------------------------------------------------------------------------------------------------------------------
 

thanks for any suggestions!

Ulysses
Re: connect by prior performance issue [message #669891 is a reply to message #669863] Wed, 23 May 2018 02:58 Go to previous messageGo to next message
Barbara Boehmer
Messages: 8848
Registered: November 2002
Location: California, USA
Senior Member
Please try the following, which traverses the tree in reverse order, and should therefore be more efficient.

select   :input_rec_id as cur_rec, t1.rec_id as orig_rec
from     rec_dtl t1
where    connect_by_isleaf = 1
start    with t1.rec_id = :input_rec_id
connect  by prior t1.rec_id_adj_from = t1.rec_id;

Re: connect by prior performance issue [message #669907 is a reply to message #669891] Wed, 23 May 2018 09:13 Go to previous message
uman2631
Messages: 13
Registered: November 2011
Junior Member
Barbara,
This is freaking awesome! This works perfectly and returns instantly! Thank you!
Previous Topic: How to implement regex (merged 2)
Next Topic: Insert values from different tables in a single statement
Goto Forum:
  


Current Time: Sun Jun 24 04:36:28 CDT 2018