Re: How to get a distinct count of result set of multople table joins?

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: Wed, 27 Feb 2008 08:27:30 -0800 (PST)
Message-ID: <8aa20cea-f22a-4a5d-969f-6b618c0e268a@60g2000hsy.googlegroups.com>


On Feb 26, 11:26 pm, "Chris ( Val )" <chris..._at_gmail.com> wrote:
> Hi everyone,
>
> I haven't played with SQL for a while, but the following SQL returns
> ~700,000 records ( a lot of duplicates ), and I used a UNION with the
> same query to reduce the records down to ~250,000. Is there a more
> efficient solution, as the UNION takes some time to run? ~3 mins.
>
> <QUERY>
>   SELECT * FROM schema.tbl_A a, schema.tbl_B b, schema.tbl_C c,
>                         schema.tbl_D d, schema.tbl_E e, schema.tbl_F
> f,
>                         schema.tbl_G g, schema.tbl_H h, schema.tbl_I
> i,
>                         schema.tbl_J j, schema.tbl_K k, schema.tbl_L
> l,
>                         schema.tbl_M m
>               WHERE  a.accountid = b.ID(+)
>                 AND  a.custom1id = c.ID(+)
>                 AND  a.custom2id = d.ID(+)
>                 AND  a.custom3id = e.ID
>                 AND  a.custom4id = f.ID(+)
>                 AND  a.entityid = g.ID(+)
>                 AND  a.icpid = h.ID(+)
>                 AND  a.parentid = i.ID(+)
>                 AND  a.periodid = j.ID(+)
>                 AND  a.scenarioid = k.ID(+)
>                 AND  a.valueid = l.ID(+)
>                 AND  a.viewid = m.ID(+)
>                 AND  d.label IN ('XXX', 'XXX')
>                 AND  e.label IN ('XXX', 'XXX')
>                 AND  f.label = 'X'
>                 AND  k.label = 'XXX'
>                 AND  l.label = 'XXX'
>                 AND  h.label = 'X'
>                 AND  m.label IN ('X', 'X');
> UNION
>    <rest (the same as above) snipped>
>  </QUERY>
>
> Problem - I am having some trouble getting a DISTINCT COUNT of the
> result set from the above. I trieed the
> following, but I still get the same (non-distinct) amount returned for
> the count.
>
> How do I get a count of the distinct records / result set?
> I tried with the UNION as above, as well as the follwoing, but still
> get a count which includes the duplicates.
>
> SELECT COUNT(*) FROM
>  (
>   SELECT DISTINCT* FROM schema.tbl_A a, schema.tbl_B b, schema.tbl_C
> c,
>                         schema.tbl_D d, schema.tbl_E e, schema.tbl_F
> f,
>                         schema.tbl_G g, schema.tbl_H h, schema.tbl_I
> i,
>                         schema.tbl_J j, schema.tbl_K k, schema.tbl_L
> l,
>                         schema.tbl_M m
>               WHERE  a.accountid = b.ID(+)
>                 AND  a.custom1id = c.ID(+)
>                 AND  a.custom2id = d.ID(+)
>                 AND  a.custom3id = e.ID
>                 AND  a.custom4id = f.ID(+)
>                 AND  a.entityid = g.ID(+)
>                 AND  a.icpid = h.ID(+)
>                 AND  a.parentid = i.ID(+)
>                 AND  a.periodid = j.ID(+)
>                 AND  a.scenarioid = k.ID(+)
>                 AND  a.valueid = l.ID(+)
>                 AND  a.viewid = m.ID(+)
>                 AND  d.label IN ('XXX', 'XXX')
>                 AND  e.label IN ('XXX', 'XXX')
>                 AND  f.label = 'X'
>                 AND  k.label = 'XXX'
>                 AND  l.label = 'XXX'
>                 AND  h.label = 'X'
>                 AND  m.label IN ('X', 'X')
>  );
>
> Thanks in advance,
> Chris

I suspect that Oracle is taking a long time to remove the duplicate rows due to a combination of the number of rows, the number of columns (you are specifying to retrieve all columns from the tables), and the amount of memory available for sorting unique (or hash unique) the rows to produce a unique list of rows. Are you able to better define the columns that are interest, and eliminate any columns that are common between the various tables. I have found in some cases on Oracle 10.2.0.2 that listing all columns in a GROUP BY allows Oracle to retrieve a distinct list of rows more quickly than by using the DISTINCT syntax:
SELECT

  COL1,
  COL2,
  COL3,
  COL4,
  COL5,

  COL6
FROM
  T1
GROUP BY
  COL1,
  COL2,
  COL3,
  COL4,
  COL5,

  COL6; Will sometimes/often execute faster than this: SELECT DISTINCT
  COL1,
  COL2,
  COL3,
  COL4,
  COL5,

  COL6
FROM
  T1;

In the second of your examples, I do not see where you are using the UNION syntax - are you trying to do something different in that example? The DISTINCT in the second example only applies to the one SELECT in the inline view, which could be a problem if you are using a UNION ALL with additional SELECTs in the inline view.

Explain plan, or better yet a DBMS Xplan showing the predicted and actual results would be helpful, as requested by Ed and Mark. If the explain plan shows nothing useful, turn on a 10046 trace at level 8 and see what is happening behind the scenes.

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc. Received on Wed Feb 27 2008 - 10:27:30 CST

Original text of this message