Re: Query performance: DivideAndConquer vs. BruteForce

From: David Williams <djw_at_smooth1.fsnet.co.uk>
Date: Mon, 7 Jan 2002 03:02:11 -0000
Message-ID: <a1b33k$idf$1@news8.svr.pol.co.uk>

"Andrew Wilson" <andrew_at_blueoffice.com> wrote in message news:a177qa$2b1m$1_at_news.cybercity.dk...
> As a rule of thumb - "The more you perform on the database server, (and
the
> less on the client) the faster will be the overall response time"!
>

True.

> Performing a loop on the client, performing multiple queries will probably
> be a factor of many times slower.
>

True.

> In the full query case I would imagine that the server would perform a
table
> scan on the parent table, and then perform full sort merges for each of
the
> child tables (depending on table sizes, statistics on columns, table
> structuring, cleverness of optimiser, complexity of query, etc, etc). All
> data then returned across the network to the client.
>
> In the "divide-and-conquer" case a table scan of the parent table is
> peformed and the data returned across the network (or locally).
>
> Now we get into the trouble area, because for each row in the parent table
> we now perform a select from each of the child tables. As they are primary
> key selects, and I assume the table is structured on primary key, then (if
> btree/ISAM) a tree walk is performed on the child table. The total number
of
> pages read will be greater (index pages need to be re-read each time
[maybe
> buffered]), the amount of server CPU will be greater (each seperate query
> will have to be re-analysed), network traffic will certainly be greater.
>
> At a total guess I would imagine that in the first case the selection is
of
> a logN type, whilst in the second case it is more likely N to some power.
>
> Stick to the server ......
>

However under Informix I find that >3 table queries are slower.

Use a temp (unlogged) table and select results into it, index it and then join to it.

I usually find many tables queries come in several forms

Lots of small reference data tables and one large table

      Join the large tables so a few of the small tables (the main ones
      which restrict the rows from the large table at low cost).
      Put this into the small temp table. Join the temp table to the rest.

   2. Several large tables which produce a small number of rows and
       many small tables.

       Join the large tables to get a small temp table then join the
       temp table to the rest.

3. Lots of complex sub selects and one/few main tables.

        Partially do the subselects into one or more temp tables which
        simplify the joins to the main tables. Then use a set of several
        temp tables to gradually join the data together.

        Here you can use knowledge you have the data to work out
        the best way to do the query. Hints:

        -  Each new temp table which results should have fewer
            rows than the last. Less rows to join is better. Get the most
            restrictive filters in earlier
        -  Perhaps only get primary jeys and the final query can get
            long strings from the table.

            E.g.

              select key1,key2, long string1 from table1,....
                 where <a> and <b> and ..

             It may be better to initially do

                select key1 from table1
                     where <a>...
                     into temp t1;
               ...
                select key1,key2,string1 from t1..
                     where <b> and

                 if <b> does not restict the number of row from <t1>
                 very much...


       This is where benchmarking comes in..!

>
> Andrew
>
>
>
> "Tushar Mahapatra" <tushar_mahapatra_at_yahoo.com> wrote in message
> news:1ad0000.0201040705.651f411_at_posting.google.com...
> > I have a question about what would be the best way to query an SQL
> > database (I am using Oracle) where many parent and child tables are
> > involved. I seek opinions on which of the following two methods would
> > require the least execution time:
> >
> > Method 1 - BruteForce method:
> > Execute a single query where all the parent and child tables are
> > joined and all the data is returned at once. For example:
> >
> > select *
> > from ParentTable P, Child1Table C1, Child2Table C2, ...
> > where P.Child1Key = C1.Key and P.Child2Key = C2.Key and ...
> >
> >
> > Method 2 - DivideAndConquer method:
> > Execute more than one query. The first query gets data from the parent
> > table:
> >
> > select *
> > from ParentTable P
> > where ...
> >
> > Next, in the front-end code, iterate through the parent table data
> > received and execute queries into the children tables:
> >
> > select *
> > from Child1Table C1
> > where C1.Key = varPChild1Key and ...
> >
> >
> >
> > I am inclined to believe that the DivideAndConquer method would be
> > faster, but I invite your comments, especially with respect to Oracle
> > 9i.
> >
> > The scenarios I have described are actually much simplified
> > representations of my situation.
>
>
Received on Sun Jan 06 2002 - 21:02:11 CST