Re: SELECT DISTINCT slow, how can I speed up

From: hpuxrac <johnbhurley_at_sbcglobal.net>
Date: Thu, 14 Aug 2008 12:28:10 -0700 (PDT)
Message-ID: <5af2da54-5e7a-4ab3-8c7c-376fd7d43c26@l64g2000hse.googlegroups.com>


On Aug 14, 2:18 pm, Barry Bulsara <bbulsar..._at_hotmail.com> wrote:
> Hello. I have a question about how I can structure a SQL query to make
> the results come back faster. I have experimented a bit and include
> some results here.
>
> We have read-only SELECT access to tables for a payroll and absence
> system running Oracle 10.0.1 on (I think) a dedicated Windows 2003
> Server in the IT department on the same site. We do not have
> privileges to UPDATE, DELETE, or create anymore indexes and if I did,
> it would probably invalidate our software support contract.
>
> We have staff using Access 2003 as a front end to the Oracle tables
> for the purposes of ad hoc queries. These staff regularly enter
> queries across 5-6 tables with various bits added to the SQL to limit
> the number of rows returned and match criteria. All the rows starts to
> come back within a second of the SQL starting to run.
>
> When the SQL is like this (I have anonymised the table names).
>
> SELECT a1,a2,a3,a5,a10
>  FROM table1, table2, table3, table4
>   WHERE table1.empID=table2.empID
>      AND table2.empID=table3.empID
>      AND table3.empID=table4.empID
>      AND table4.absenceT IN (15,19)
>      AND table4.year=2008;
>
> The rows start to come back immediately. Around 1700-1800 rows
> typically come back and there is a lot of network IO. If I type
>
> SELECT COUNT(*) FROM
> (
> SELECT a1,a2,a3,a5,a10
>  FROM table1, table2, table3, table4
>   WHERE table1.empID=table2.empID
>      AND table2.empID=table3.empID
>      AND table3.empID=table4.empID
>      AND table4.absenceT IN (15,19)
>      AND table4.year=2008
> )
>
> it returns in 2.14seconds saying it found 1792 rows.
>
> If I type SELECT DISTINCT a1,a2,a3,a5,a10
>
> only 20-50 rows come back and the query takes 28 seconds and there is
> a lot less network IO.
>
> I only want the 20-50 rows but I don't want to wait 28 seconds. As
> Oracle seems to find all 1792 rows in 2.14seconds, why does it take 28
> seconds only to return the unique ones.
>
> These are ad hoc queries so I am not trying to optimize a single
> query. My question is Is there a more quick way of generally returning
> a DISTINCT set of rows other than using DISTINCT in the SELECT
> statement.
>
> Thank you
> Barry

Explain plans as well as output from a 10046 trace would be helpful for something like this.

In general SELECT DISTINCT is going to have some additional overhead in oracle.

Having a SELECT COUNT return in 2+ seconds is not quite the same as having 2000 rows come back as oracle may be coming up with quite different execution plans between the SELECT COUNT and the SELECT columns.

Can you code the query in such as way that you don't need to use DISTINCT? The EXISTS/NOT EXISTS and IN/NOT IN constructs are often useful in situations like this if you cannot force uniqueness just by correct table joins. Received on Thu Aug 14 2008 - 14:28:10 CDT

Original text of this message