Path: dp-news.maxwell.syr.edu!spool.maxwell.syr.edu!drn.maxwell.syr.edu!news.maxwell.syr.edu!postnews.google.com!g47g2000cwa.googlegroups.com!not-for-mail
From: "Ed Prochak" <ed.prochak@magicinterface.com>
Newsgroups: comp.databases.oracle.server,comp.databases.oracle.misc
Subject: Re: Why DISTINCT is costly?
Date: 12 Sep 2005 06:41:50 -0700
Organization: http://groups.google.com
Lines: 25
Message-ID: <1126532510.060522.144480@g47g2000cwa.googlegroups.com>
References: <1126507505.386016.211310@f14g2000cwb.googlegroups.com>
   <1126513158.832436.51790@g43g2000cwa.googlegroups.com>
   <3ol0s0F6e343U1@individual.net>
NNTP-Posting-Host: 69.219.11.162
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
X-Trace: posting.google.com 1126532516 27210 127.0.0.1 (12 Sep 2005 13:41:56 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Mon, 12 Sep 2005 13:41:56 +0000 (UTC)
In-Reply-To: <3ol0s0F6e343U1@individual.net>
User-Agent: G2/0.2
X-HTTP-UserAgent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0),gzip(gfe),gzip(gfe)
X-HTTP-Via: 1.0 proxy02.ta.com:3128 (squid/2.5.STABLE3)
Complaints-To: groups-abuse@google.com
Injection-Info: g47g2000cwa.googlegroups.com; posting-host=69.219.11.162;
   posting-account=z8J2OwwAAABzrSXAjma7SXOGeLMZZ_s1
Xref: dp-news.maxwell.syr.edu comp.databases.oracle.server:251321 comp.databases.oracle.misc:121813


Robert Klemme wrote:
> sim wrote:
> > In most cases it's better to use a group by than distinct. That should
> > decrease the cost of your queries.
>
> Why?  The task of identifying duplicates remains the same...
>
>     robert

Right, it doesn't reduce the cost.  I always suggest replacing

SELECT DISTINCT x ...
    with
SELECT x,COUNT(*) ... GROUP BY x ;

especially for adhoc queries. Then at least you get back some
information about the duplicates.

For production, you can either refine your query to get back unique
information (preferred IMHO), or ignore the extras based on an informed
choice.

  Ed

