Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Summing hierarchical data SQL
Michel Cadot wrote:
[...]
> -----------------------------------------
>
> When I said I don't think I meant I don't think you can build
> an *efficient* query without a correlated subquery.
Nice try. What you actually said was "I don't think so" without qulifying your message with any performance concerns (as I did in my original message):
<quote>
> > | It might be interesting, if this result can be achieved without
> > | correlated subquery however...
> > |
> > | Best regards
> > |
> > | Maxim
> >
> > I don't think so.
</quote>
> Yes, you can write many awful queries like yours building
> twice the hierarchy with nested loops between the two.
How about some statistical evidence to backup your "awful" adjective ?
>
> A little less worse version is:
>
> with
> data as (
> select /*+ materialized */
> id, name, parent, frequency,
> sys_connect_by_path (id, '/') id_path
> from t
> connect by prior id = parent
> start with parent is null
> )
> select a.id, a.name, a.parent, sum(b.frequency) frequency
> from data a, data b
> where instr(b.id_path,a.id_path) > 0
> group by a.id, a.name, a.parent
> order by a.id
> /
>
> which build a temporary table with the result of the connect by
> but still leads to nested loops.
Sure one can optimize the original query by factoring out the subquery at the expense of materiaizing the latter.
>
> My first query is built around an hash join between the original
> table and the full hierarchy generated by the connect by.
You are confused, amigo. The hash join you see in the execution plan is used to create the hierarchy even for a simple query like:
select id, frequency
from t1 connect by prior id = parent start with parent is null
The hash join has got zip to do with the correlated subquery.
What actually happens with the subquery is that many smaller hierarchies are created on fly, one per each row of the original table (that why the query is called 'correlated'), not the "full hierarchy" as you've claimed..
As for some stat. evidence:
SQL> select
2 a.id,
3 a.name,
4 sum(b.frequency)
5 from
6 (select id, name, sys_connect_by_path(id, '/') path
7 from t1 connect by prior id = parent start with parent is null
8 ) a,
9 (select frequency, sys_connect_by_path(id, '/') path
10 from t1 connect by prior id = parent start with parent is null
11 ) b
12 where instr(b.path, a.path) = 1
13 group by a.id, a.name
14 /
Statistics
0 recursive calls 0 db block gets 196 consistent gets 0 physical reads 0 redo size 595 bytes sent via SQL*Net to client 500 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 29 sorts (memory) 0 sorts (disk) 6 rows processed
SQL>
-- B. correlated SQL> select id, name, parent, 2 ( select sum(frequency) 3 from t1 t2 4 connect by prior id = parent 5 start with t2.id = t1.id ) frequency 6 from t1 t1 7 / Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 120 consistent gets 0 physical reads 0 redo size 655 bytes sent via SQL*Net to client 500 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 16 sorts (memory) 0 sorts (disk) 6 rows processed SQL> ---- C. "with" SQL> with 2 data as ( 3 select /*+ materialized */ 4 id, name, parent, frequency, 5 sys_connect_by_path (id, '/') id_path 6 from t1 7 connect by prior id = parent 8 start with parent is null 9 ) 10 select a.id, a.name, a.parent, sum(b.frequency) frequency 11 from data a, data b 12 where instr(b.id_path,a.id_path) =1 13 group by a.id, a.name, a.parent 14 / 6 rows selected. Statistics ---------------------------------------------------------- 2 recursive calls 8 db block gets 51 consistent gets 1 physical reads 604 redo size 655 bytes sent via SQL*Net to client 500 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 5 sorts (memory) 0 sorts (disk) 6 rows processed As can be seen, with respect to consistent gets A (sys_connect) is worse than B (correlated) by 50%, and C( sys_connect optimized with "with") is actually two times better than B at the expense of course of having to create the materialized hierarchy. One may actually want to try different approaches before finally deciding what is "awful" and what is not.Received on Thu Oct 05 2006 - 13:40:54 CDT
>
> Regards
> Michel Cadot