Re: Optimization of query finding related products for a certain product

Home -> Community -> Usenet -> c.d.o.misc -> Re: Optimization of query finding related products for a certain product

From: Jacob Nordgren <jacob_nordgren_at_hotmail.com>
Date: 20 Sep 2001 07:01:25 -0700
Message-ID: <47c6b9be.0109200601.27a735ab@posting.google.com>

Hi Vladimir,

I've tried your query:

SELECT PRODUCTID FROM VISIT
WHERE EXISTS (SELECT 1 FROM VISIT WHERE PRODUCTID = 1840253053) AND PRODUCTID <> 1840253053
GROUP BY PRODUCTID ORDER BY COUNT(DISTINCT USER_ID) DESC but the result is not the same as my slow query. What I need is a query that returns the same result. Your query counts all users, but I only want to count the users that looked at this product before.

/ Jacob

"Vladimir M. Zakharychev" <bob_at_dpsp-yes.com> wrote in message news:<9nahpt$h4i$1_at_babylon.agtel.net>...
> Hmm...
>
> If I got it right, this query tries to find all users that looked at this
> product and then select all products which those other users also
> looked at, sorting products by popularity. The bottleneck here
> may be IN () clause as each day the number of records returned by
> the subquery will most probably grow. I would try this rewrite:
>
> select product_id from visit
> where exists(select 1 from visit where product_id = :prod_id)
> and product_id != :prod_id
> group by product_id
> order by count(distinct user_id) desc
>
> Even without any indexes in my test case the plan for original
> query was
>
> SELECT STATEMENT (COST : 6)
> SORT (ORDER BY)
> SORT (GROUP BY)
> MERGE JOIN
> SORT (JOIN)
> TABLE ACCESS (FULL) -- VISIT
> SORT (JOIN)
> TABLE ACCESS (FULL) -- VISIT
>
> while my rewrite gave
>
> SELECT STATEMENT (COST : 4 )
> SORT (ORDER BY)
> SORT (GROUP BY)
> FILTER
> TABLE ACCESS (FULL) -- VISIT
> TABLE ACCESS (FULL) -- VISIT
>
> which is obviously a shorter path, while query results were
> the same. With indexes on both product_id and user_id the
> original query plan became
>
> SELECT STATEMENT (COST : 6)
> SORT (ORDER BY)
> SORT (GROUP BY)
> NESTED LOOPS
> TABLE ACCESS (BY INDEX ROWID) -- VISIT
> INDEX (RANGE SCAN) -- IDX$PRODID (NON-UNIQUE)
>
> TABLE ACCESS (BY INDEX ROWID) -- VISIT
> INDEX (RANGE SCAN) -- IDX$USERID (NON-UNIQUE)
>
> SELECT STATEMENT (COST : 4)
> SORT (ORDER BY)
> SORT (GROUP BY)
> FILTER
> TABLE ACCESS (FULL) -- VISIT
> TABLE ACCESS (BY INDEX ROWID) -- VISIT
> INDEX (RANGE SCAN) -- IDX$PRODID (NON-UNIQUE)
>
> which is still shorter path because nested loops are eliminated.
>
> Also, analyze table compute statistics and see if it makes any
> difference to have stats on this table for cost-based optimizer,
> experiment with different indexes (for example, composite
> index on product_id, user_id may eliminate full table scan in
> my rewrite), but keep in mind that index access time adds to
> total execution time, so in some cases (when indexes have
> bad selectivity) full table scans perform better than range scans.
>
> --
> Vladimir Zakharychev (bob@dpsp-yes.com) http://www.dpsp-yes.com
> Dynamic PSP(tm) - the first true RAD toolkit for Oracle-based internet applications.
> All opinions are mine and do not necessarily go in line with those of my employer.
>
>
> "Davide Bianchi" <davidebianchi_at_davidebianchi.net> wrote in message
> news:9naaes$63cfc$1_at_ID-18487.news.dfncis.de...
> > "Jacob Nordgren" <jacob_nordgren_at_hotmail.com> wrote in message
> > news:47c6b9be.0109070232.2196e3ea_at_posting.google.com...
> > > This query finds the related products for a certain product
> > > based on user visits.
> > >
> > > Today this query takes 3 seconds. Is it possible to make it
> > > faster with the same result?
> > >
> > > SELECT PRODUCT_ID FROM VISIT WHERE
> > > USER_ID IN (SELECT DISTINCT USER_ID FROM VISIT WHERE PRODUCT_ID =
> > > 1883577330)
> > > AND PRODUCT_ID <> 1883577330 GROUP BY PRODUCT_ID ORDER BY
> > > COUNT(DISTINCT USER_ID) DESC
> >
> > NOTE: cross-posting eliminated.
> >
> > The use of subquerys is one of the most time-consuming
> > thing during the extraction of data. And the logic of this
> > thing is quite convoluted. You ask for the products of the
> > user that request product X AND everything else, grouped by
> > product_id... to me this look like a
> >
> > SELECT PRODUCT_ID FROM VISIT GROUP BY PRODUCT_ID
> > ORDER BY COUNT(DISTINCT USER_ID) DESC
> >
> > (but maybe I missed something...)
> >
> > Davide
> >
> >
> >
Received on Thu Sep 20 2001 - 09:01:25 CDT