Re: Ordering Hierarchical Queries

From: Peter Schneider <peter.schneider_at_okay.net>
Date: 1998/03/31
Message-ID: <352035ec.14834373_at_news.okay.net>


On Wed, 25 Mar 1998 21:15:56 +0800, "Junmin Lin" <linjm_at_guomai.sh.cn> wrote:

>Hi,
>
>I have a question about hierarchical queries.
>
>Suppose I have a table:
>
>PARENT NOT NULL VARCHAR2(10)
> CHILD NOT NULL VARCHAR2(10)
> SORT_ORDER NUMBER(5)
>
>I want to do a hierachical query, I could do:
>
>select parent, child, sort_order, level
>from relation
>connect by parent = prior child start with parent = 'A' ;
>
>But, if I want the out put sorted via sort_order and at the same time
>preseved the level.
>
>How should I do it?
>
>Thanks in advance.
>
>
>Lin Junmin.

Hi Lin,

I've put together the following example to demonstrate how my ordering approach works.

The table looks like this:

CREATE TABLE tree_nodes
  (node_id NUMBER(12) NOT NULL,
   parent_node_id NUMBER(12),
   node_name VARCHAR2(10) NOT NULL,    CONSTRAINT tn_pk PRIMARY KEY(node_id),    CONSTRAINT tn_tn_fk FOREIGN KEY(parent_node_id)      REFERENCES tree_nodes(node_id));

I've inserted the following test data (note that I've entered some nodes in reverse order to demonstrate the order program):

INSERT INTO tree_nodes
VALUES(1, NULL, 'START_1');
INSERT INTO tree_nodes
VALUES(2, 1, 'C');
INSERT INTO tree_nodes
VALUES(3, 1, 'B');
INSERT INTO tree_nodes
VALUES(4, 1, 'A');
INSERT INTO tree_nodes
VALUES(5, 2, 'Z');
INSERT INTO tree_nodes
VALUES(6, 2, 'Y');
INSERT INTO tree_nodes
VALUES(7, 2, 'X');
INSERT INTO tree_nodes
VALUES(8, 2, 'W');
INSERT INTO tree_nodes
VALUES(9, 3, 'F');
INSERT INTO tree_nodes
VALUES(10, 3, 'G');
INSERT INTO tree_nodes
VALUES(11, 4, 'H');
INSERT INTO tree_nodes
VALUES(12, 4, 'I');
INSERT INTO tree_nodes
VALUES(13, 4, 'J');
INSERT INTO tree_nodes
VALUES(14, NULL, 'START_2');
INSERT INTO tree_nodes
VALUES(15, 14, 'T');
INSERT INTO tree_nodes
VALUES(16, 14, 'S'); So if I just pull the data out in the order of the connect by, it looks like this:

COLUMN node_id_lvl FORMAT a12
COLUMN parent_node_id FORMAT 9999
COLUMN level FORMAT 99
SELECT lpad(' ', 3*level) || node_id node_id_lvl,

       node_name, level, parent_node_id
  FROM tree_nodes
CONNECT BY PRIOR node_id = parent_node_id START WITH node_name = UPPER('&node');

SQL> _at_show_tree
Enter value for node: start_1
old 5: START WITH node_name = UPPER('&node') new 5: START WITH node_name = UPPER('start_1')

NODE_ID_LVL NODE_NAME LEVEL PARENT_NODE_ID ------------ ---------- ----- --------------

   1         START_1        1
      2      C              2              1
         5   Z              3              2
         6   Y              3              2
         7   X              3              2
         8   W              3              2
      3      B              2              1
         9   F              3              3
         10  G              3              3
      4      A              2              1
         11  H              3              4
         12  I              3              4
         13  J              3              4

13 rows selected.

Then I tested the solution that Sergei provided:

SQL> create index tree1_idx on tree_nodes(node_id, node_name);

Index created.

SQL> create index tree2_idx on tree_nodes(parent_node_id, node_name);

Index created.

The result was now as follows:

SQL> _at_show_tree
Enter value for node: start_1
old 5: START WITH node_name = UPPER('&node') new 5: START WITH node_name = UPPER('start_1')

NODE_ID_LVL NODE_NAME LEVEL PARENT_NODE_ID ------------ ---------- ----- --------------

   1         START_1        1
      4      A              2              1
         11  H              3              4
         12  I              3              4
         13  J              3              4
      3      B              2              1
         9   F              3              3
         10  G              3              3
      2      C              2              1
         8   W              3              2
         7   X              3              2
         6   Y              3              2
         5   Z              3              2

13 rows selected.

So the data is sorted on node_name within each level, and the structure is conserved. I agree that this is the best solution, but it has one drawback IMHO:

It works only because the optimizer chose to do an index range scan on tree2_idx (tree1_idx didn't get used actually). But if one (perhaps a DBA on your customers installation, without asking _you_) decides to add another index like this one:

SQL> create index tn_tn_fk_i on tree_nodes(parent_node_id);

Index created.

... just an ordinary index to support the recursive foreign key, it will not work anymore, because now the optimizer decides to use this index which is _not_ ordered on node_name as the second column. So if your application depends on the order of the output (or it is at least very important as it was in my case, the ordered report output) _and_ the index structure is something beyond your permanent control, then this solution is not as reliable as necessary IMHO. The same goes for the other solution using an index organized table (which is not available in 7.x anyway).

Another approach is to have a function that builds a path description (like '/START_1/A/H/I') for each record and order by that function, which works very well, too. The drawback for this is a performance issue: if you happen to have a 'deep' data structure (say with n levels depth), then for each record your function must read n records 'backwards', i.e. from child to root, to build that description. If you have 1000 records in a tree, and the average level depth is 5, you will have 5000 record accesses !

Ok, now this the alghorithm I used to handle this, and of course, it has its own drawbacks, too :-)

CREATE OR REPLACE
Package ORDER_NODES
IS

TYPE id_tabtype IS
TABLE OF tree_nodes.node_id%TYPE
INDEX BY BINARY_INTEGER; PROCEDURE init

   (p_start_node_id IN tree_nodes.node_id%TYPE);

PROCEDURE done;

FUNCTION order_id

   (p_node_id IN tree_nodes.node_id%TYPE) RETURN NUMBER; PRAGMA restrict_references(order_id, WNDS);

END order_nodes;
/

CREATE OR REPLACE
Package Body ORDER_NODES
IS

/* Private vars */

  nodes_tab id_tabtype;
  empty_tab id_tabtype;

  node_cnt BINARY_INTEGER := 0;

/* Private procedure */

PROCEDURE build_table

   (p_node_id IN tree_nodes.node_id%TYPE) IS
BEGIN    node_cnt := node_cnt + 1;
   nodes_tab(p_node_id) := node_cnt;

   FOR node_rec IN

      (SELECT node_id
         FROM tree_nodes
        WHERE parent_node_id = p_node_id
       ORDER BY node_name)
   LOOP
      build_table(node_rec.node_id);

   END LOOP; END build_table;

/* Public procedures */

PROCEDURE init

   (p_start_node_id IN tree_nodes.node_id%TYPE) IS
BEGIN    node_cnt := 0;
   nodes_tab := empty_tab;

   build_table(p_start_node_id);

END init;

PROCEDURE done
IS
BEGIN    node_cnt := 0;
   nodes_tab := empty_tab;

END done;

FUNCTION order_id

   (p_node_id IN tree_nodes.node_id%TYPE) RETURN NUMBER
IS

   ret_node_id tree_nodes.node_id%TYPE; BEGIN    ret_node_id := nodes_tab(p_node_id);
   RETURN ret_node_id;

   EXCEPTION
   WHEN no_data_found
   THEN RETURN -1; END order_id;

END order_nodes;
/

Now I can do the following script (note that I dropped all indexes meanwhile):

COLUMN node_id_lvl FORMAT a12
COLUMN parent_node_id FORMAT 9999
COLUMN level FORMAT 99
SELECT lpad(' ', 3*level) || node_id node_id_lvl,

       node_name, level, parent_node_id,
       order_nodes.order_id(node_id) order_crit
  FROM tree_nodes
CONNECT BY PRIOR node_id = parent_node_id START WITH node_name = UPPER('&node')
ORDER BY order_crit;

First, without calling the init procedure:

SQL> _at_show_tree2
Enter value for node: start_1
old 5: START WITH node_name = UPPER('&node') new 5: START WITH node_name = UPPER('start_1')

NODE_ID_LVL NODE_NAME LEVEL PARENT_NODE_ID ORDER_CRIT

------------ ---------- ----- -------------- ----------
   1         START_1        1                        -1
      2      C              2              1         -1
         5   Z              3              2         -1
         6   Y              3              2         -1
         7   X              3              2         -1
         8   W              3              2         -1
      3      B              2              1         -1
         9   F              3              3         -1
         10  G              3              3         -1
      4      A              2              1         -1
         11  H              3              4         -1
         12  I              3              4         -1
         13  J              3              4         -1

13 rows selected.

Now I call the init procedure, which will recursively build a PL/SQL table with the order numbers:

SQL> exec order_nodes.init(1)

PL/SQL procedure successfully completed.

SQL> _at_show_tree2
Enter value for node: start_1
old 5: START WITH node_name = UPPER('&node') new 5: START WITH node_name = UPPER('start_1')

NODE_ID_LVL NODE_NAME LEVEL PARENT_NODE_ID ORDER_CRIT ------------ ---------- ----- -------------- ----------

   1         START_1        1                         1
      4      A              2              1          2
         11  H              3              4          3
         12  I              3              4          4
         13  J              3              4          5
      3      B              2              1          6
         9   F              3              3          7
         10  G              3              3          8
      2      C              2              1          9
         8   W              3              2         10
         7   X              3              2         11
         6   Y              3              2         12
         5   Z              3              2         13

13 rows selected.

SQL> exec order_nodes.done -- deallocate the PL/SQL table from PGA

PL/SQL procedure successfully completed.

As you see, it works !
Now, if you have 1000 records in a tree, and the average level depth is 5, you will have only 2000 record accesses, because each record must only be read twice: one time for the sorting routine, another time for the actual query.

The drawback here is that it will use a lot of PGA memory for the PL/SQL table, if you have very large quantities of data. But as I mentioned in my earlier posting, this works okay for me with about 30000-50000 rows in my recursive table.

The background for me is that in my team we're working on an integrated materials management system where we use a whole lot of these recursive data structures, e.g. for our Bill of Materials module. So if anyone has comments, experiences or other solutions for problems of this kind, I'd appreciate any input.

Best Regards,
Peter

-- 
Peter Schneider
peter.schneider_at_okay.net
Received on Tue Mar 31 1998 - 00:00:00 CEST

Original text of this message