Re: How to get a distinct count of result set of multople table joins?
Date: Wed, 27 Feb 2008 08:27:30 -0800 (PST)
Message-ID: <8aa20cea-f22a-4a5d-969f-6b618c0e268a@60g2000hsy.googlegroups.com>
On Feb 26, 11:26 pm, "Chris ( Val )" <chris..._at_gmail.com> wrote:
> Hi everyone,
>
> I haven't played with SQL for a while, but the following SQL returns
> ~700,000 records ( a lot of duplicates ), and I used a UNION with the
> same query to reduce the records down to ~250,000. Is there a more
> efficient solution, as the UNION takes some time to run? ~3 mins.
>
> <QUERY>
> SELECT * FROM schema.tbl_A a, schema.tbl_B b, schema.tbl_C c,
> schema.tbl_D d, schema.tbl_E e, schema.tbl_F
> f,
> schema.tbl_G g, schema.tbl_H h, schema.tbl_I
> i,
> schema.tbl_J j, schema.tbl_K k, schema.tbl_L
> l,
> schema.tbl_M m
> WHERE a.accountid = b.ID(+)
> AND a.custom1id = c.ID(+)
> AND a.custom2id = d.ID(+)
> AND a.custom3id = e.ID
> AND a.custom4id = f.ID(+)
> AND a.entityid = g.ID(+)
> AND a.icpid = h.ID(+)
> AND a.parentid = i.ID(+)
> AND a.periodid = j.ID(+)
> AND a.scenarioid = k.ID(+)
> AND a.valueid = l.ID(+)
> AND a.viewid = m.ID(+)
> AND d.label IN ('XXX', 'XXX')
> AND e.label IN ('XXX', 'XXX')
> AND f.label = 'X'
> AND k.label = 'XXX'
> AND l.label = 'XXX'
> AND h.label = 'X'
> AND m.label IN ('X', 'X');
> UNION
> <rest (the same as above) snipped>
> </QUERY>
>
> Problem - I am having some trouble getting a DISTINCT COUNT of the
> result set from the above. I trieed the
> following, but I still get the same (non-distinct) amount returned for
> the count.
>
> How do I get a count of the distinct records / result set?
> I tried with the UNION as above, as well as the follwoing, but still
> get a count which includes the duplicates.
>
> SELECT COUNT(*) FROM
> (
> SELECT DISTINCT* FROM schema.tbl_A a, schema.tbl_B b, schema.tbl_C
> c,
> schema.tbl_D d, schema.tbl_E e, schema.tbl_F
> f,
> schema.tbl_G g, schema.tbl_H h, schema.tbl_I
> i,
> schema.tbl_J j, schema.tbl_K k, schema.tbl_L
> l,
> schema.tbl_M m
> WHERE a.accountid = b.ID(+)
> AND a.custom1id = c.ID(+)
> AND a.custom2id = d.ID(+)
> AND a.custom3id = e.ID
> AND a.custom4id = f.ID(+)
> AND a.entityid = g.ID(+)
> AND a.icpid = h.ID(+)
> AND a.parentid = i.ID(+)
> AND a.periodid = j.ID(+)
> AND a.scenarioid = k.ID(+)
> AND a.valueid = l.ID(+)
> AND a.viewid = m.ID(+)
> AND d.label IN ('XXX', 'XXX')
> AND e.label IN ('XXX', 'XXX')
> AND f.label = 'X'
> AND k.label = 'XXX'
> AND l.label = 'XXX'
> AND h.label = 'X'
> AND m.label IN ('X', 'X')
> );
>
> Thanks in advance,
> Chris
I suspect that Oracle is taking a long time to remove the duplicate
rows due to a combination of the number of rows, the number of columns
(you are specifying to retrieve all columns from the tables), and the
amount of memory available for sorting unique (or hash unique) the
rows to produce a unique list of rows. Are you able to better define
the columns that are interest, and eliminate any columns that are
common between the various tables. I have found in some cases on
Oracle 10.2.0.2 that listing all columns in a GROUP BY allows Oracle
to retrieve a distinct list of rows more quickly than by using the
DISTINCT syntax:
SELECT
COL1, COL2, COL3, COL4, COL5,
COL6
FROM
T1
GROUP BY
COL1, COL2, COL3, COL4, COL5,
COL6; Will sometimes/often execute faster than this: SELECT DISTINCT
COL1, COL2, COL3, COL4, COL5,
COL6
FROM
T1;
In the second of your examples, I do not see where you are using the UNION syntax - are you trying to do something different in that example? The DISTINCT in the second example only applies to the one SELECT in the inline view, which could be a problem if you are using a UNION ALL with additional SELECTs in the inline view.
Explain plan, or better yet a DBMS Xplan showing the predicted and actual results would be helpful, as requested by Ed and Mark. If the explain plan shows nothing useful, turn on a 10046 trace at level 8 and see what is happening behind the scenes.
Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.
Received on Wed Feb 27 2008 - 10:27:30 CST