Re: Optimization of query finding related products for a certain product

Home -> Community -> Usenet -> c.d.o.misc -> Re: Optimization of query finding related products for a certain product

From: Vladimir M. Zakharychev <bob_at_dpsp-yes.com>
Date: Fri, 7 Sep 2001 17:28:02 +0400
Message-ID: <9nahpt$h4i$1@babylon.agtel.net>

Hmm...

If I got it right, this query tries to find all users that looked at this product and then select all products which those other users also looked at, sorting products by popularity. The bottleneck here may be IN () clause as each day the number of records returned by the subquery will most probably grow. I would try this rewrite:

select product_id from visit
where exists(select 1 from visit where product_id = :prod_id)

and product_id != :prod_id
group by product_id
order by count(distinct user_id) desc

Even without any indexes in my test case the plan for original query was

SELECT STATEMENT (COST : 6)
SORT (ORDER BY)

     SORT (GROUP BY)
       MERGE JOIN
         SORT (JOIN)
           TABLE ACCESS (FULL) -- VISIT
         SORT (JOIN)
           TABLE ACCESS (FULL) -- VISIT

while my rewrite gave

SELECT STATEMENT (COST : 4 )
SORT (ORDER BY)

     SORT (GROUP BY)
         FILTER
           TABLE ACCESS (FULL) -- VISIT
           TABLE ACCESS (FULL) -- VISIT

which is obviously a shorter path, while query results were the same. With indexes on both product_id and user_id the original query plan became

SELECT STATEMENT (COST : 6)
SORT (ORDER BY)

     SORT (GROUP BY)
       NESTED LOOPS
         TABLE ACCESS (BY INDEX ROWID) -- VISIT
            INDEX (RANGE SCAN) -- IDX$PRODID (NON-UNIQUE)

         TABLE ACCESS (BY INDEX ROWID) -- VISIT
            INDEX (RANGE SCAN) -- IDX$USERID (NON-UNIQUE)

SELECT STATEMENT (COST : 4)
SORT (ORDER BY)

     SORT (GROUP BY)
       FILTER
         TABLE ACCESS (FULL) -- VISIT
         TABLE ACCESS (BY INDEX ROWID) -- VISIT
            INDEX (RANGE SCAN) -- IDX$PRODID (NON-UNIQUE)

which is still shorter path because nested loops are eliminated.

Also, analyze table compute statistics and see if it makes any difference to have stats on this table for cost-based optimizer, experiment with different indexes (for example, composite index on product_id, user_id may eliminate full table scan in my rewrite), but keep in mind that index access time adds to total execution time, so in some cases (when indexes have bad selectivity) full table scans perform better than range scans.

--
Vladimir Zakharychev (bob@dpsp-yes.com)                http://www.dpsp-yes.com
Dynamic PSP(tm) - the first true RAD toolkit for Oracle-based internet applications.
All opinions are mine and do not necessarily go in line with those of my employer.


"Davide Bianchi" <davidebianchi_at_davidebianchi.net> wrote in message
news:9naaes$63cfc$1_at_ID-18487.news.dfncis.de...


> "Jacob Nordgren" <jacob_nordgren_at_hotmail.com> wrote in message

> news:47c6b9be.0109070232.2196e3ea_at_posting.google.com...

> > This query finds the related products for a certain product

> > based on user visits.

> >

> > Today this query takes 3 seconds. Is it possible to make it

> > faster with the same result?

> >

> > SELECT PRODUCT_ID FROM VISIT WHERE

> > USER_ID IN (SELECT DISTINCT USER_ID FROM VISIT WHERE PRODUCT_ID =

> > 1883577330)

> > AND PRODUCT_ID <> 1883577330 GROUP BY PRODUCT_ID ORDER BY

> > COUNT(DISTINCT USER_ID) DESC

>

> NOTE: cross-posting eliminated.

>

> The use of subquerys is one of the most time-consuming

> thing during the extraction of data. And the logic of this

> thing is quite convoluted. You ask for the products of the

> user that request product X AND everything else, grouped by

> product_id... to me this look like a

>

> SELECT PRODUCT_ID FROM VISIT GROUP BY PRODUCT_ID

> ORDER BY COUNT(DISTINCT USER_ID) DESC

>

> (but maybe I missed something...)

>

> Davide

>

>

>

Received on Fri Sep 07 2001 - 08:28:02 CDT