Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Optimization of query finding related products for a certain product
Hmm...
If I got it right, this query tries to find all users that looked at this product and then select all products which those other users also looked at, sorting products by popularity. The bottleneck here may be IN () clause as each day the number of records returned by the subquery will most probably grow. I would try this rewrite:
select product_id from visit
where exists(select 1 from visit where product_id = :prod_id)
and product_id != :prod_id
group by product_id
order by count(distinct user_id) desc
Even without any indexes in my test case the plan for original query was
SELECT STATEMENT (COST : 6)
SORT (ORDER BY)
SORT (GROUP BY) MERGE JOIN SORT (JOIN) TABLE ACCESS (FULL) -- VISIT SORT (JOIN) TABLE ACCESS (FULL) -- VISIT
while my rewrite gave
SELECT STATEMENT (COST : 4 )
SORT (ORDER BY)
SORT (GROUP BY) FILTER TABLE ACCESS (FULL) -- VISIT TABLE ACCESS (FULL) -- VISIT
which is obviously a shorter path, while query results were the same. With indexes on both product_id and user_id the original query plan became
SELECT STATEMENT (COST : 6)
SORT (ORDER BY)
SORT (GROUP BY) NESTED LOOPS TABLE ACCESS (BY INDEX ROWID) -- VISIT INDEX (RANGE SCAN) -- IDX$PRODID (NON-UNIQUE) TABLE ACCESS (BY INDEX ROWID) -- VISIT INDEX (RANGE SCAN) -- IDX$USERID (NON-UNIQUE)
SELECT STATEMENT (COST : 4)
SORT (ORDER BY)
SORT (GROUP BY) FILTER TABLE ACCESS (FULL) -- VISIT TABLE ACCESS (BY INDEX ROWID) -- VISIT INDEX (RANGE SCAN) -- IDX$PRODID (NON-UNIQUE)
which is still shorter path because nested loops are eliminated.
Also, analyze table compute statistics and see if it makes any difference to have stats on this table for cost-based optimizer, experiment with different indexes (for example, composite index on product_id, user_id may eliminate full table scan in my rewrite), but keep in mind that index access time adds to total execution time, so in some cases (when indexes have bad selectivity) full table scans perform better than range scans.
-- Vladimir Zakharychev (bob@dpsp-yes.com) http://www.dpsp-yes.com Dynamic PSP(tm) - the first true RAD toolkit for Oracle-based internet applications. All opinions are mine and do not necessarily go in line with those of my employer. "Davide Bianchi" <davidebianchi_at_davidebianchi.net> wrote in message news:9naaes$63cfc$1_at_ID-18487.news.dfncis.de...Received on Fri Sep 07 2001 - 08:28:02 CDT
> "Jacob Nordgren" <jacob_nordgren_at_hotmail.com> wrote in message
> news:47c6b9be.0109070232.2196e3ea_at_posting.google.com...
> > This query finds the related products for a certain product
> > based on user visits.
> >
> > Today this query takes 3 seconds. Is it possible to make it
> > faster with the same result?
> >
> > SELECT PRODUCT_ID FROM VISIT WHERE
> > USER_ID IN (SELECT DISTINCT USER_ID FROM VISIT WHERE PRODUCT_ID =
> > 1883577330)
> > AND PRODUCT_ID <> 1883577330 GROUP BY PRODUCT_ID ORDER BY
> > COUNT(DISTINCT USER_ID) DESC
>
> NOTE: cross-posting eliminated.
>
> The use of subquerys is one of the most time-consuming
> thing during the extraction of data. And the logic of this
> thing is quite convoluted. You ask for the products of the
> user that request product X AND everything else, grouped by
> product_id... to me this look like a
>
> SELECT PRODUCT_ID FROM VISIT GROUP BY PRODUCT_ID
> ORDER BY COUNT(DISTINCT USER_ID) DESC
>
> (but maybe I missed something...)
>
> Davide
>
>
>