Re: How to get a distinct count of result set of multople table joins?

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: Thu, 28 Feb 2008 09:36:21 -0800 (PST)
Message-ID: <2e972400-002c-412c-a388-6b1586cd7da2@p43g2000hsc.googlegroups.com>


On Feb 28, 11:27 am, Mark D Powell <Mark.Pow..._at_eds.com> wrote:
> On Feb 27, 10:03 pm, Charles Hooper <hooperc2..._at_yahoo.com> wrote:
> > I think that I now understand what you originally posted.  The two
> > queries should be returning the same number of rows before the count,
> > since the UNION syntax removes duplicate rows automatically.  Note
> > the:
> >   "SORT (UNIQUE)" in your execution plan.
>
> > The first test is similar to your first query:
> > SELECT
> >   COUNT(*)
> > FROM
> > (SELECT
> >   *
> > FROM
> >   T1,
> >   T2,
> >   T3
> > WHERE
> >   T1.C1=T2.C1(+)
> >   AND T1.C1=T3.C1(+)
> > UNION
> > SELECT
> >   *
> > FROM
> >   T1,
> >   T2,
> >   T3
> > WHERE
> >   T1.C1=T2.C1(+)
> >   AND T1.C1=T3.C1(+));
>
> >   COUNT(*)
> > ----------
> >    2994198
>
> > The DBMS Xplan (watch for wordwrapping):
> > ---------------------------------------------------------------------------­­-------------------------------------------------------------------------
> > | Id  | Operation                  | Name | Starts | E-Rows | A-Rows
> > |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
> > Used-Tmp|
> > ---------------------------------------------------------------------------­­-------------------------------------------------------------------------
> > |   1 |  SORT AGGREGATE            |      |      1 |      1 |      1 |
> > 00:03:16.27 |    5392 |  73199 |  73199 |       |       |
> > |         |
> > |   2 |   VIEW                     |      |      1 |   2806K|   2994K|
> > 00:03:09.16 |    5392 |  73199 |  73199 |       |       |
> > |         |
> > |   3 |    SORT UNIQUE             |      |      1 |   2806K|   2994K|
> > 00:03:03.17 |    5392 |  73199 |  73199 |   269M|  8322K|   37M
> > (1)|     572K|
> > |   4 |     UNION-ALL              |      |      1 |        |   7089K|
> > 00:01:32.59 |    5342 |      0 |      0 |       |       |
>
> > The above took 3 minutes and 16 seconds on Oracle 11.1.0.6
>
> > Sample 2 returns the same result, just significantly faster:
> > SELECT
> >   COUNT(*)
> > FROM
> > (SELECT DISTINCT
> >   *
> > FROM
> >   T1,
> >   T2,
> >   T3
> > WHERE
> >   T1.C1=T2.C1(+)
> >   AND T1.C1=T3.C1(+));
>
> >   COUNT(*)
> > ----------
> >    2994198
>
> > ---------------------------------------------------------------------------­­------------------------------------------------------------------------
> > | Id  | Operation                 | Name | Starts | E-Rows | A-Rows
> > |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
> > Used-Tmp|
> > ---------------------------------------------------------------------------­­------------------------------------------------------------------------
> > |   1 |  SORT AGGREGATE           |      |      1 |      1 |      1 |
> > 00:01:18.14 |    2671 |  40365 |  40365 |       |       |
> > |         |
> > |   2 |   VIEW                    |      |      1 |   1403K|   2994K|
> > 00:01:11.67 |    2671 |  40365 |  40365 |       |       |
> > |         |
> > |   3 |    HASH UNIQUE            |      |      1 |   1403K|   2994K|
> > 00:01:05.68 |    2671 |  40365 |  40365 |   351M|    15M|   46M
> > (1)|     337K|
> > |*  4 |     HASH JOIN RIGHT OUTER |      |      1 |   1403K|   3544K|
> > 00:00:28.78 |    2671 |      0 |      0 |  1102K|  1102K|  181K
>
> I was under the impression that performance is not an issue but rather
> it is the fact that the two queries in the initial post are not
> returning the same result set.
>
> The base query is an outer join.  Since an outer join says return all
> rows from the base table plus the matching data from the joined table
> or null values when there is not matching row, duplicate output would
> potentially be expected but duplicate output could also indicate that
> a required join condition has not been specified.
>
> The join should be exaimined to be sure all join conditions have been
> properly specified.  The query should be checked to make sure it is
> the right query for the desired output.
>
> Then we are back to if
>
> select query
> union
> select  same_query
>
> and
>
> select count(*) from ( select distinct query)
>
> should produce the same result set counts.
>
> Note that getting the same results set does potentially depend on none
> of the base tables in the join being updated during the time the two
> queries are ran else the view of the data will be different.
>
> -- Mark D Powell --

Mark, very good information.

It is quite possible that I am still misunderstanding what the OP is trying to accomplish.

The OP first stated "Is there a more efficient solution, as the UNION takes some time to run? ~3 mins." and in a later post stated, "please note that performance is of secondary importance to me. Being able to obtain a correct count of the rows returned by the UNION query is my main concern."

The first SQL example was supposed to mimic the UNION query that the OP posted, and the second was supposed to mimic the SELECT DISTINCT query that the OP posted. I attempted to walk the OP through a process of examining different SQL solutions that return the same count of rows, but roughly three times faster than the UNION. (Side note: I also wanted to verify that the claim I made in a previous post in this thread about GROUP BY being faster than DISTINCT still held true in Oracle 11i, and whether or not the number of columns made a difference for the speed at which distinct rows are obtained.)

The explain plans seem to indicate that the two query methods posted by the OP should produce the same record counts (or more specifically, COUNT(*) returned by the one query should be the number of rows returned by the other). The explain plan for one of my examples showed SORT UNIQUE, and the other showed HASH UNIQUE, but otherwise they were very similar.

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc. Received on Thu Feb 28 2008 - 11:36:21 CST

Original text of this message