Creating a hierical tree structure from strings
Recently, I have been helping a company in redesigning their schema, and one issue was that they stored hieracal trees in varchar2(4000) column, one full branch per row.
Data was if this kind:
/Oracle/8/1/5/Microsoft /Oracle/8/1/5/VMS /Oracle/8/1/5/0/1/VMS
Their main problem was that searching in this was really ugly, as they had to full table/index scan a lot, when searching using like '%%'
I came up with this solution to convert this into a small table that can be queried using connect by instead:
CREATE OR REPLACE TYPE node_parent_type AS OBJECT ( lvl NUMBER, node VARCHAR2(200), PARENT VARCHAR2(200) ) / CREATE OR REPLACE TYPE str2Branch_Type AS TABLE OF node_parent_type / CREATE OR REPLACE FUNCTION Str2Branch ( p_string IN VARCHAR2 , p_delim IN VARCHAR2 ) RETURN str2Branch_Type PIPELINED AS l_str LONG; l_piece LONG; l_parent LONG := NULL; l_n NUMBER; l_lvl NUMBER := 0; BEGIN l_str := p_string; LOOP l_n := INSTR(l_str, p_delim); IF (l_n = 1) --Starts with a delimiter THEN l_n := INSTR(SUBSTR(l_str,2), p_delim); END IF; EXIT WHEN (NVL(l_n,0) = 0); l_piece := SUBSTR( l_str, 1, l_n ); --First piece l_str := SUBSTR( l_str, l_n+1 ); --Rest l_n := INSTR( l_piece, p_delim, -1 ); EXIT WHEN (NVL(l_n,0) = 0); --Only needed, if we have to end with a delimiter. pipe ROW( node_parent_type ( l_lvl, LTRIM(RTRIM(SUBSTR(l_piece,l_n+1))), l_parent ) ); l_parent := LTRIM(RTRIM(SUBSTR(l_piece,l_n+1))); l_lvl := l_lvl + 1; END LOOP; RETURN; END; / CREATE GLOBAL TEMPORARY TABLE XXX_TREE_TEMP ( ID NUMBER NOT NULL , node_name VARCHAR2(256) , parent_branch VARCHAR2(4000) ) ON COMMIT DELETE ROWS; INSERT INTO XXX_TREE_TEMP SELECT ROWNUM , b.node , b.parent_branch FROM ( WITH p AS (SELECT the_full_node FROM big_table WHERE the_full_node IS NOT NULL GROUP BY the_full_node) SELECT DISTINCT t.node , SUBSTR(p.the_full_node, 1, INSTR(p.the_full_node,'/',1,t.lvl+1)-1) AS parent_branch , t.lvl FROM P, TABLE(Str2branch(p.the_full_node,'/')) t ORDER BY t.lvl, t.node ) b ; INSERT INTO XXX_TREE SELECT t1.ID AS node_id , MIN(t2.ID) AS parent_id , t1.node_name FROM XXX_TREE_TEMP t1, XXX_TREE_TEMP t2 WHERE t1.parent_branch = DECODE(t2.parent_branch,NULL,'',t2.parent_branch)'/'t2.node_name GROUP BY t1.ID, t1.node_name UNION ALL SELECT t1.ID AS node_id , NULL AS parent_id , t1.node_name FROM XXX_TREE_TEMP t1 WHERE t1.parent_branch IS NULL GROUP BY t1.ID, t1.node_name ORDER BY 1, 2 ;
This can then be queries like this to find the original full braches:
SELECT ID , SYS_CONNECT_BY_PATH(a.node_name, '/')'/' navn FROM XXX_TREE a CONNECT BY PRIOR ID = parent_id START WITH parent_id IS NULL;
Limiting to branches with a special string is trivial now, as the string can be indexed ( xxx_tree.node_name ).
When it goes into production, I will update about the success.
I would have liked to not use the temporary table, but my first shot at this failed (nested with statements), and this is not that important, as this is a one shot load.
Recent testing show a reduction in query runtime when searching for a number of branches, dropping from minutes to sub-seconds.
- kimant's blog
- Log in to post comments