Re: Is there a faster/better way to do this?

From: John Darrah <jdarrah_at_veripost.net>
Date: Wed, 3 Apr 2002 23:40:53 +0000 (UTC)
Message-ID: <855cf978bb9a65d67e892d0769ca32e3.36240@mygate.mailgate.org>

Mabey you could throw an index on the 5/4/3 fields (possibly a bitmap index seperately on each column) then you could issue something like:

INSERT /*+ APPEND NOLOGGING */ INTO intermediate (SELECT t2.*

                            FROM (SELECT field5, field4, field3,count(*)
                                    FROM 4million_tab
                                  GROUP BY field5,field4,field3 HAVING
count(*) > 1) t1,
                                  4million_tab t2
                           WHERE t1.field5 = t2.field5
                             AND t1.field4 = t2.field4
                             AND t1.field3 = t2.field3)

After this, you can insert these results into your matches table with the lowest rank and rerun your query for each subsequent group by on the intermediate table. The assumption I am making here is that a very small subset of your data is duplicated so your queries against the intermediate table will be against a much smaller data set. If this is an incorrect assumption, then this approach will take longer. Another thing to investigate is using parallel query to speed this up.

Let me know if this helps,

John

-- 
Posted via Mailgate.ORG Server - http://www.Mailgate.ORG

Received on Wed Apr 03 2002 - 17:40:53 CST