Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Removing duplicate subtrees from CONNECT-BY query

Removing duplicate subtrees from CONNECT-BY query

From: Riku Räsänen <riku.rasanen_at_kantamestarit.fi>
Date: Sun, 07 Oct 2007 21:50:08 +0300
Message-ID: <op.tzubxuburkn1vh@smtp.saunalahti.fi>


Hello,

Certain application has a table with huge hierarchy, and a subtree of the hierarchy is allowed to exist in several places in the tree. No loops are allowed though.

The problem is that this hierarchy has to be searched effectively. Querying a certain top-level hierarchy returns over 600 000 DISTINCT nodes. With the duplicate subtrees, the query returns almost 6 000 000 nodes, where certain node appears 2000 times in the result.

Of course the requirement is that the search has to be effective and even this worst case of 6M rows should be handled in reasonable time (this is an OLTP application). This "subtree allowed to appear several times" is completely new case for me in the land of CONNECT BY's. Did RTFM, did "use the Google, Luke" etc, but did not find anything sensible.

So the question is: What is the most efficient way of removing the duplicates of this resultset? Currently there is no way to identify duplicate subtrees from the data except that the same ID appears multiple times. Is there a way to make CONNECT BY -operator to identify and prune the duplicate subtrees?

Example setup:

CREATE TABLE tree_hierarchy (

   id NUMBER (20)
  ,parent_id NUMBER (20)
);

INSERT INTO tree_hierarchy (id, parent_id) VALUES (1, NULL);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (2, 1);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (4, 2);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (5, 2);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (6, 2);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (9, 4);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (10, 4);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (3, 1);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (7, 3);
INSERT INTO tree_hierarchy (id, parent_id) VALUES (8, 3);

-- Here is the duplicate:

INSERT INTO tree_hierarchy (id, parent_id) VALUES (2, 8);

COMMIT; COL id FOR A15

SELECT LPAD (' ', (LEVEL - 1) * 2) || TO_CHAR (id, '999') id

      , parent_id
   FROM tree_hierarchy
START WITH id = 1
CONNECT BY PRIOR id = parent_id
;

ID PARENT_ID
--------------- ----------

    1

      2                   1
        4                 2
          9               4
         10               4
        5                 2
        6                 2
      3                   1
        7                 3
        8                 3
          2               8		-- The duplicate subtree starts here
            4             2
              9           4
             10           4
            5             2
            6             2




--

Riku
--

http://www.freelists.org/webpage/oracle-l Received on Sun Oct 07 2007 - 13:50:08 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US