Re: Query performance: DivideAndConquer vs. BruteForce

From: Daniel Guntermann <guntermann_at_uswest.net>
Date: Sat, 12 Jan 2002 01:49:24 -0800
Message-ID: <hrT%7.1643$UV2.194623_at_news.uswest.net>


[Quoted] Of course the best alternative depends on the context of the problem, but generally speaking, especially in the case of 3, I'm not sure that I agree that creating a bunch of temp tables is necessarily the most "efficient" way of processing or executing a general query involving complex sub-selects and one/few main tables. It helps to "divide and conquer" and problem solving process when constructing complex queries, but I am inclined to believe that does not mean an automatic physical manifestation of logic based code in the form of separate distinct queries and temporary structures.

I'm not familiar with the Informix optimizer, but these days, optimizers have gotten to the point where they are pretty decent in finding more efficient algorithms, usually based on finding equivalent relational algebra/calculus expressions, to generate the final relation representation, given that it has an entire query to work with at once.

Another issue is whether the temp tables are actually written to disk, or whether they are maintained purely in memory. Disk I/O still takes longer than holding and processing everything in memory. I imagine that some vendors attempt to use temp tables in a transient sort of way, while others, as well as developers, actually initialize structures on disk as a persistant structure.

I personally try as much as possible to refrain from intermediate table solutions unless it is absolutely clear that the optimizer and indexes are doing an unacceptable job of producing results. If intermediate tables are required to materialize complex processing results to improve or facilitate view and reporting results based on batch or large sets of data, and especially in the case of deriving new data, I can understand the use of them. But in the case of temp tables, there is often some overhead involved in managing the creation, population, and destruction of such tables; and it goes against the mantra of one fact, in one place, etc. because all one is really doing is filtering the same data into new structures as a kind of 'snapshot', often co-existing within the same local database.

Thus, replication issues and concurrency issues might come into play. For example, one issue that might be considered is the possibility that in the case of analyzing and solving a complex problem from a large set of data using temp tables, tuples within the base data set could change mid way through the extraction process between different temp tables. Ideally, an optimizer would make the right choices in determining the granularity, level, and length of time for maintaining locks on all tuples/rows being manipulated/scanned for the duration of the entire query - by this I mean both the main query and the nested sub-querys and various correlations. However, a think a danger might exist in that in using a "divide and conquer" process with multiple intermediate temp db structures, data could be updated or added in the base tables between individual queries that could lead to some incongruency and inconsistency.

Obviously, there might be some cases where it would be wise to explicitly control the locks on data when using temporary structures, especially since you take this ability away from the optimizer when you break it up into chunks. On the other hand, if the database is not updatable, then the points I raised are not really an issue except in the cases where data might be concurrently added as a query is processing; which in that case, the points I raised are just as pertinent.

I do agree on the hints.

Dan

>
> 3. Lots of complex sub selects and one/few main tables.
>
> Partially do the subselects into one or more temp tables which
> simplify the joins to the main tables. Then use a set of several
> temp tables to gradually join the data together.
>
> Here you can use knowledge you have the data to work out
> the best way to do the query. Hints:
>
> - Each new temp table which results should have fewer
> rows than the last. Less rows to join is better. Get the most
> restrictive filters in earlier
> - Perhaps only get primary jeys and the final query can get
> long strings from the table.
>
> E.g.
>
> select key1,key2, long string1 from table1,....
> where <a> and <b> and ..
>
> It may be better to initially do
>
> select key1 from table1
> where <a>...
> into temp t1;
> ...
> select key1,key2,string1 from t1..
> where <b> and
>
> if <b> does not restict the number of row from <t1>
> very much...
>
>
> This is where benchmarking comes in..!
>
>
> >
> > Andrew
> >
> >
> >
> > "Tushar Mahapatra" <tushar_mahapatra_at_yahoo.com> wrote in message
> > news:1ad0000.0201040705.651f411_at_posting.google.com...
> > > I have a question about what would be the best way to query an SQL
> > > database (I am using Oracle) where many parent and child tables are
> > > involved. I seek opinions on which of the following two methods would
> > > require the least execution time:
> > >
> > > Method 1 - BruteForce method:
> > > Execute a single query where all the parent and child tables are
> > > joined and all the data is returned at once. For example:
> > >
> > > select *
> > > from ParentTable P, Child1Table C1, Child2Table C2, ...
> > > where P.Child1Key = C1.Key and P.Child2Key = C2.Key and ...
> > >
> > >
> > > Method 2 - DivideAndConquer method:
> > > Execute more than one query. The first query gets data from the parent
> > > table:
> > >
> > > select *
> > > from ParentTable P
> > > where ...
> > >
> > > Next, in the front-end code, iterate through the parent table data
> > > received and execute queries into the children tables:
> > >
> > > select *
> > > from Child1Table C1
> > > where C1.Key = varPChild1Key and ...
> > >
> > >
> > >
> > > I am inclined to believe that the DivideAndConquer method would be
> > > faster, but I invite your comments, especially with respect to Oracle
> > > 9i.
> > >
> > > The scenarios I have described are actually much simplified
> > > representations of my situation.
> >
> >
>
>
>
Received on Sat Jan 12 2002 - 10:49:24 CET

Original text of this message