Re: SELECT DISTINCT slow, how can I speed up

From: <fitzjarrell_at_cox.net>
Date: Thu, 14 Aug 2008 12:54:50 -0700 (PDT)
Message-ID: <890da6ee-9f19-4789-876f-72c00b98a03e@b1g2000hsg.googlegroups.com>


On Aug 14, 2:45 pm, "fitzjarr..._at_cox.net" <orat..._at_msn.com> wrote:
> On Aug 14, 1:18 pm, Barry Bulsara <bbulsar..._at_hotmail.com> wrote:
>
>
>
>
>
> > Hello. I have a question about how I can structure a SQL query to make
> > the results come back faster. I have experimented a bit and include
> > some results here.
>
> > We have read-only SELECT access to tables for a payroll and absence
> > system running Oracle 10.0.1 on (I think) a dedicated Windows 2003
> > Server in the IT department on the same site. We do not have
> > privileges to UPDATE, DELETE, or create anymore indexes and if I did,
> > it would probably invalidate our software support contract.
>
> > We have staff using Access 2003 as a front end to the Oracle tables
> > for the purposes of ad hoc queries. These staff regularly enter
> > queries across 5-6 tables with various bits added to the SQL to limit
> > the number of rows returned and match criteria. All the rows starts to
> > come back within a second of the SQL starting to run.
>
> > When the SQL is like this (I have anonymised the table names).
>
> > SELECT a1,a2,a3,a5,a10
> >  FROM table1, table2, table3, table4
> >   WHERE table1.empID=table2.empID
> >      AND table2.empID=table3.empID
> >      AND table3.empID=table4.empID
> >      AND table4.absenceT IN (15,19)
> >      AND table4.year=2008;
>
> > The rows start to come back immediately. Around 1700-1800 rows
> > typically come back and there is a lot of network IO. If I type
>
> > SELECT COUNT(*) FROM
> > (
> > SELECT a1,a2,a3,a5,a10
> >  FROM table1, table2, table3, table4
> >   WHERE table1.empID=table2.empID
> >      AND table2.empID=table3.empID
> >      AND table3.empID=table4.empID
> >      AND table4.absenceT IN (15,19)
> >      AND table4.year=2008
> > )
>
> > it returns in 2.14seconds saying it found 1792 rows.
>
> > If I type SELECT DISTINCT a1,a2,a3,a5,a10
>
> > only 20-50 rows come back and the query takes 28 seconds and there is
> > a lot less network IO.
>
> > I only want the 20-50 rows but I don't want to wait 28 seconds. As
> > Oracle seems to find all 1792 rows in 2.14seconds, why does it take 28
> > seconds only to return the unique ones.
>
> > These are ad hoc queries so I am not trying to optimize a single
> > query. My question is Is there a more quick way of generally returning
> > a DISTINCT set of rows other than using DISTINCT in the SELECT
> > statement.
>
> > Thank you
> > Barry
>
> You might try this, although I don't know how much faster, if at all,
> it will be:
>
> SELECT a1,a2,a3,a5,a10
>  FROM table1, table2, table3, table4
>   WHERE table1.empID=table2.empID
>      AND table2.empID=table3.empID
>      AND table3.empID=table4.empID
>      AND table4.absenceT IN (15,19)
>      AND table4.year=2008
> group by a1, a2, a3, a5, a10;
>
> Possibly you should ask about increasing the hash_area_size parameter
> value as 10g uses hashing algorithms to process such requests, rather
> than the old-style sort operations.
>
> David Fitzjarrell- Hide quoted text -
>
> - Show quoted text -

With a 8 million records and no index I find the group by executes in less than half of the time of distinct:

SQL> select distinct ename
  2 from dist_tst;

ENAME



NANCY10
NANCY2
NANCY7
NANCY9
NANCY5
NANCY3
NANCY6
NANCY1
NANCY4
NANCY8 10 rows selected.

Elapsed: 00:00:14.32

Execution Plan



Plan hash value: 3730636855
| Id  | Operation          | Name     | Rows  | Bytes |TempSpc| Cost
(%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |  8049K|    92M|       |
45750   (9)| 00:09:10 |
|   1 |  HASH UNIQUE       |          |  8049K|    92M|   308M|
45750   (9)| 00:09:10 |
|   2 |   TABLE ACCESS FULL| DIST_TST |  8049K|    92M|       |  5918
(10)| 00:01:12 |

Note


  • dynamic sampling used for this statement

Statistics


         49  recursive calls
          1  db block gets
      47196  consistent gets
      11429  physical reads
    1680880  redo size
        347  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

SQL>
SQL> select ename
  2 from dist_tst
  3 group by ename;

ENAME



NANCY9
NANCY10
NANCY2
NANCY6
NANCY4
NANCY7
NANCY5
NANCY3
NANCY1
NANCY8 10 rows selected.

Elapsed: 00:00:05.79

Execution Plan



Plan hash value: 1228703371
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)|
Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |  8049K|    92M|  9067  (42)|
00:01:49 |
|   1 |  HASH GROUP BY     |          |  8049K|    92M|  9067  (42)|
00:01:49 |
| 2 | TABLE ACCESS FULL| DIST_TST | 8049K| 92M| 5918 (10)| 00:01:12 |

Note


  • dynamic sampling used for this statement

Statistics


          4  recursive calls
          0  db block gets
      23699  consistent gets
      11462  physical reads
          0  redo size
        348  bytes sent via SQL*Net to client
        246  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

SQL> So it may help you, it may not. You'll need to test this on your own data.

David Fitzjarrell Received on Thu Aug 14 2008 - 14:54:50 CDT

Original text of this message