Re: How to get a distinct count of result set of multople table joins?
Date: Wed, 27 Feb 2008 14:22:05 -0800 (PST)
Message-ID: <4ae15b10-446e-4aaa-8bd6-dc182159aeb5@x41g2000hsb.googlegroups.com>
On Feb 28, 2:02 am, Mark D Powell <Mark.Pow..._at_eds.com> wrote:
> On Feb 27, 8:05 am, Ed Prochak <edproc..._at_gmail.com> wrote:
>
>
>
>
>
> > On Feb 26, 11:26 pm, "Chris ( Val )" <chris..._at_gmail.com> wrote:
>
> > > Hi everyone,
>
> > > I haven't played with SQL for a while, but the following SQL returns
> > > ~700,000 records ( a lot of duplicates ), and I used a UNION with the
> > > same query to reduce the records down to ~250,000. Is there a more
> > > efficient solution, as the UNION takes some time to run? ~3 mins.
> > []
>
> > > Thanks in advance,
> > > Chris
>
> > What does the EXPLAIN PLAN show?
>
> > Performance issues are always data dependent. IOW, the phrase:
> > KNOW THY DATA
> > is one to take to heart.
>
> > Come back with the PLAN and we may have some suggestions.
> > Ed
Hi Mark,
> Chris, if the query in the select distinct is the same as the first
> there should be no duplicates returned.
> What is the full version of Oracle?
Version: "10g Enterprise Edition Release 10.1.0.3.0" That is what my company is using, and it (unfortunately) won't change for a while.
> Does the explain plan Ed asked for show a hash group by?
No, but a lot of full table scans, but I don't know if that is normal
for
this type of query - Please see Explain Plan posted to Ed.
> If so, depending on your version of Oracle, there is a bug where
> queries solved with a hash group by can return wrong results; however,
> I would expect the distinct to eliminate duplicates still.
>
> Are you sure you just do not have more rows than you expected?
Yes, because the result set spooled to a file from running the
following query
has been visually inspected, and looks to obtain a distinct set of
records 250,000, and
without the union, there are definatly duplicates and records in the
vicinity of ~700,000
returned.
<QUERY>
SELECT * FROM schema.tbl_A a, schema.tbl_B b, schema.tbl_C c,
schema.tbl_D d, schema.tbl_E e, schema.tbl_F f, schema.tbl_G g, schema.tbl_H h, schema.tbl_I i, schema.tbl_J j, schema.tbl_K k, schema.tbl_L l, schema.tbl_M m WHERE a.accountid = b.ID(+) AND a.custom1id = c.ID(+) AND a.custom2id = d.ID(+) AND a.custom3id = e.ID AND a.custom4id = f.ID(+) AND a.entityid = g.ID(+) AND a.icpid = h.ID(+) AND a.parentid = i.ID(+) AND a.periodid = j.ID(+) AND a.scenarioid = k.ID(+) AND a.valueid = l.ID(+) AND a.viewid = m.ID(+) AND d.label IN ('XXX', 'XXX') AND e.label IN ('XXX', 'XXX') AND f.label = 'X' AND k.label = 'XXX' AND l.label = 'XXX' AND h.label = 'X' AND m.label IN ('X', 'X')UNION
SELECT * FROM schema.tbl_A a, schema.tbl_B b, schema.tbl_C c,
schema.tbl_D d, schema.tbl_E e, schema.tbl_F f, schema.tbl_G g, schema.tbl_H h, schema.tbl_I i, schema.tbl_J j, schema.tbl_K k, schema.tbl_L l, schema.tbl_M m WHERE a.accountid = b.ID(+) AND a.custom1id = c.ID(+) AND a.custom2id = d.ID(+) AND a.custom3id = e.ID AND a.custom4id = f.ID(+) AND a.entityid = g.ID(+) AND a.icpid = h.ID(+) AND a.parentid = i.ID(+) AND a.periodid = j.ID(+) AND a.scenarioid = k.ID(+) AND a.valueid = l.ID(+) AND a.viewid = m.ID(+) AND d.label IN ('XXX', 'XXX') AND e.label IN ('XXX', 'XXX') AND f.label = 'X' AND k.label = 'XXX' AND l.label = 'XXX' AND h.label = 'X' AND m.label IN ('X', 'X');</QUERY>
Now if I try to run the query above using the COUNT syntax as in my
first posting, the
result of the count is the higher figure of ~700,000, which means that
the duplicates
are being counted, and not what I want.
I don't know if it is a problem with the way I have written the query
to count the
result set returned, or it's something else.
Thanks for your help.
-- ChrisReceived on Wed Feb 27 2008 - 16:22:05 CST
> HTH -- Mark D Powell --- Hide quoted text -
>
> - Show quoted text -