Re: How to get a distinct count of result set of multople table joins?

From: Chris ( Val ) <chrisval_at_gmail.com>
Date: Wed, 27 Feb 2008 14:22:05 -0800 (PST)
Message-ID: <4ae15b10-446e-4aaa-8bd6-dc182159aeb5@x41g2000hsb.googlegroups.com>


On Feb 28, 2:02 am, Mark D Powell <Mark.Pow..._at_eds.com> wrote:
> On Feb 27, 8:05 am, Ed Prochak <edproc..._at_gmail.com> wrote:
>
>
>
>
>
> > On Feb 26, 11:26 pm, "Chris ( Val )" <chris..._at_gmail.com> wrote:
>
> > > Hi everyone,
>
> > > I haven't played with SQL for a while, but the following SQL returns
> > > ~700,000 records ( a lot of duplicates ), and I used a UNION with the
> > > same query to reduce the records down to ~250,000. Is there a more
> > > efficient solution, as the UNION takes some time to run? ~3 mins.
> > []
>
> > > Thanks in advance,
> > > Chris
>
> > What does the EXPLAIN PLAN show?
>
> > Performance issues are always data dependent. IOW, the phrase:
> > KNOW THY DATA
> > is one to take to heart.
>
> > Come back with the PLAN and we may have some suggestions.
> >    Ed

Hi Mark,

> Chris, if the query in the select distinct is the same as the first
> there should be no duplicates returned.

> What is the full version of Oracle?

Version: "10g Enterprise Edition Release 10.1.0.3.0" That is what my company is using, and it (unfortunately) won't change for a while.

> Does the explain plan Ed asked for show a hash group by?

No, but a lot of full table scans, but I don't know if that is normal for
this type of query - Please see Explain Plan posted to Ed.

> If so, depending on your version of Oracle, there is a bug where
> queries solved with a hash group by can return wrong results; however,
> I would expect the distinct to eliminate duplicates still.
>
> Are you sure you just do not have more rows than you expected?

Yes, because the result set spooled to a file from running the following query
has been visually inspected, and looks to obtain a distinct set of records 250,000, and
without the union, there are definatly duplicates and records in the vicinity of ~700,000
returned.

<QUERY>
  SELECT * FROM schema.tbl_A a, schema.tbl_B b, schema.tbl_C c,

                        schema.tbl_D d, schema.tbl_E e, schema.tbl_F
f,
                        schema.tbl_G g, schema.tbl_H h, schema.tbl_I
i,
                        schema.tbl_J j, schema.tbl_K k, schema.tbl_L
l,
                        schema.tbl_M m
              WHERE  a.accountid = b.ID(+)
                AND  a.custom1id = c.ID(+)
                AND  a.custom2id = d.ID(+)
                AND  a.custom3id = e.ID
                AND  a.custom4id = f.ID(+)
                AND  a.entityid = g.ID(+)
                AND  a.icpid = h.ID(+)
                AND  a.parentid = i.ID(+)
                AND  a.periodid = j.ID(+)
                AND  a.scenarioid = k.ID(+)
                AND  a.valueid = l.ID(+)
                AND  a.viewid = m.ID(+)
                AND  d.label IN ('XXX', 'XXX')
                AND  e.label IN ('XXX', 'XXX')
                AND  f.label = 'X'
                AND  k.label = 'XXX'
                AND  l.label = 'XXX'
                AND  h.label = 'X'
                AND  m.label IN ('X', 'X')
UNION
  SELECT * FROM schema.tbl_A a, schema.tbl_B b, schema.tbl_C c,
                        schema.tbl_D d, schema.tbl_E e, schema.tbl_F
f,
                        schema.tbl_G g, schema.tbl_H h, schema.tbl_I
i,
                        schema.tbl_J j, schema.tbl_K k, schema.tbl_L
l,
                        schema.tbl_M m
              WHERE  a.accountid = b.ID(+)
                AND  a.custom1id = c.ID(+)
                AND  a.custom2id = d.ID(+)
                AND  a.custom3id = e.ID
                AND  a.custom4id = f.ID(+)
                AND  a.entityid = g.ID(+)
                AND  a.icpid = h.ID(+)
                AND  a.parentid = i.ID(+)
                AND  a.periodid = j.ID(+)
                AND  a.scenarioid = k.ID(+)
                AND  a.valueid = l.ID(+)
                AND  a.viewid = m.ID(+)
                AND  d.label IN ('XXX', 'XXX')
                AND  e.label IN ('XXX', 'XXX')
                AND  f.label = 'X'
                AND  k.label = 'XXX'
                AND  l.label = 'XXX'
                AND  h.label = 'X'
                AND  m.label IN ('X', 'X');
 </QUERY>

Now if I try to run the query above using the COUNT syntax as in my first posting, the
result of the count is the higher figure of ~700,000, which means that the duplicates
are being counted, and not what I want.

I don't know if it is a problem with the way I have written the query to count the
result set returned, or it's something else.

Thanks for your help.

--
Chris



> HTH -- Mark D Powell --- Hide quoted text -
>
> - Show quoted text -
Received on Wed Feb 27 2008 - 16:22:05 CST

Original text of this message