Re: Performance problem in a "Connect-By" query over a DAG-edges table

From: Oren Louidor <olouidor_at_tx.technion.ac.il>
Date: 13 May 2003 02:34:34 -0700
Message-ID: <39d6c4dc.0305130134.22e9652@posting.google.com>

Hi,

Thanks for you answer.

We do know that our query is right. We have tested it on small cases and verified it analitically. The problem is that is doesn't stop thinking when the amount of ancestors is too big. We think that this is because during this reverse DFS, oracle gets to the same vertex many times, not regarding the fact that it has already been there. This is due to our graph being a DAG, in which there are many ways to get from one vertex to the other. The examination of a vertex more then once doesn't effect the correctness of the result, neither does it suppose to make the query not to stop (since there are no cycles - we know it), but it might make the query execute for a really long time, since the amount of times the server gets to the same vertex increases exponentially as the level of recursion increases. What we are looking for is a way to prevent oracle from examining the same vertex more then once - i.e., making the reverse DFS more efficient. Maybe righting distinct at the select statement will help? This is the thing we will test next.

Thanks,
Oren

afilonov_at_yahoo.com (Alex Filonov) wrote in message news:<336da121.0305120739.7131de97_at_posting.google.com>...
> olouidor_at_tx.technion.ac.il (Oren Louidor) wrote in message news:<39d6c4dc.0305112334.425c4e66_at_posting.google.com>...
> > We're using an "Edges" table with records consisting of two fields:
> > ParentID ChildID (actually, the table has some more columns, which
> > are not important to the problem).
> >
> > This table describes a DAG-type graph, with each record being
> > an edge between two vertices with IDs given in these two fields.
> > Each vertex may have many parents and many childs.
> >
> > We want to get all the ancestors of a given vertex, using the
> > following
> > query:
> > select ParentID from Edges
> > start with ChildID = #####
> > connect by prior ParentID = ChildID.
> >
> > When the table size is about 1000 records, this query doesn't stop. We
>
> Doesn't stop thinking or doesn't stop printing output?
>
> > know, that we don't have cycles in the our described graph. We think
>
> Usually Oracle detects loops in Connect BY query and returns an error.
>
> > that the problem comes from the fact that, since the graph is a DAG,
> > there are many ways to get to a certain ancestor of our vertex with
> > the amount of ways increasing exponentially. We followed the execution
> > of the query, by examining the output of ParentIDs and indeed, it
> > seems that the query got to the same record many times.
> >
>
> I'd recommend to test this query on the small table and make sure (using
> level function) that result is what you actually want. Query conditions
> appear to be OK to me, but who knows. Also, it's a good idea to research
> problem analitically. Actually, you're building ascending tree in the DAG,
> it should be possible to estimate number of vertices and edges in the tree
> for some particular cases. Then compare estimated value with the number of
> records you're getting from the queries.
>
> I can't give a better advice here, it would require detailed knowledge of
> your data.
>
> > Is there a way to avoid this, or maybe there's a better way to get all
> > the ancestors of our vertex.
> >
> > Thanks,
> > Oren.
Received on Tue May 13 2003 - 04:34:34 CDT