Re: SELECT DISTINCT slow, how can I speed up
Date: Thu, 14 Aug 2008 12:54:50 -0700 (PDT)
Message-ID: <890da6ee-9f19-4789-876f-72c00b98a03e@b1g2000hsg.googlegroups.com>
On Aug 14, 2:45 pm, "fitzjarr..._at_cox.net" <orat..._at_msn.com> wrote:
> On Aug 14, 1:18 pm, Barry Bulsara <bbulsar..._at_hotmail.com> wrote:
>
>
>
>
>
> > Hello. I have a question about how I can structure a SQL query to make
> > the results come back faster. I have experimented a bit and include
> > some results here.
>
> > We have read-only SELECT access to tables for a payroll and absence
> > system running Oracle 10.0.1 on (I think) a dedicated Windows 2003
> > Server in the IT department on the same site. We do not have
> > privileges to UPDATE, DELETE, or create anymore indexes and if I did,
> > it would probably invalidate our software support contract.
>
> > We have staff using Access 2003 as a front end to the Oracle tables
> > for the purposes of ad hoc queries. These staff regularly enter
> > queries across 5-6 tables with various bits added to the SQL to limit
> > the number of rows returned and match criteria. All the rows starts to
> > come back within a second of the SQL starting to run.
>
> > When the SQL is like this (I have anonymised the table names).
>
> > SELECT a1,a2,a3,a5,a10
> > FROM table1, table2, table3, table4
> > WHERE table1.empID=table2.empID
> > AND table2.empID=table3.empID
> > AND table3.empID=table4.empID
> > AND table4.absenceT IN (15,19)
> > AND table4.year=2008;
>
> > The rows start to come back immediately. Around 1700-1800 rows
> > typically come back and there is a lot of network IO. If I type
>
> > SELECT COUNT(*) FROM
> > (
> > SELECT a1,a2,a3,a5,a10
> > FROM table1, table2, table3, table4
> > WHERE table1.empID=table2.empID
> > AND table2.empID=table3.empID
> > AND table3.empID=table4.empID
> > AND table4.absenceT IN (15,19)
> > AND table4.year=2008
> > )
>
> > it returns in 2.14seconds saying it found 1792 rows.
>
> > If I type SELECT DISTINCT a1,a2,a3,a5,a10
>
> > only 20-50 rows come back and the query takes 28 seconds and there is
> > a lot less network IO.
>
> > I only want the 20-50 rows but I don't want to wait 28 seconds. As
> > Oracle seems to find all 1792 rows in 2.14seconds, why does it take 28
> > seconds only to return the unique ones.
>
> > These are ad hoc queries so I am not trying to optimize a single
> > query. My question is Is there a more quick way of generally returning
> > a DISTINCT set of rows other than using DISTINCT in the SELECT
> > statement.
>
> > Thank you
> > Barry
>
> You might try this, although I don't know how much faster, if at all,
> it will be:
>
> SELECT a1,a2,a3,a5,a10
> FROM table1, table2, table3, table4
> WHERE table1.empID=table2.empID
> AND table2.empID=table3.empID
> AND table3.empID=table4.empID
> AND table4.absenceT IN (15,19)
> AND table4.year=2008
> group by a1, a2, a3, a5, a10;
>
> Possibly you should ask about increasing the hash_area_size parameter
> value as 10g uses hashing algorithms to process such requests, rather
> than the old-style sort operations.
>
> David Fitzjarrell- Hide quoted text -
>
> - Show quoted text -
With a 8 million records and no index I find the group by executes in less than half of the time of distinct:
SQL> select distinct ename
2 from dist_tst;
ENAME
NANCY10
NANCY2
NANCY7
NANCY9
NANCY5
NANCY3
NANCY6
NANCY1
NANCY4
NANCY8 10 rows selected.
Elapsed: 00:00:14.32
Execution Plan
Plan hash value: 3730636855
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 8049K| 92M| | 45750 (9)| 00:09:10 | | 1 | HASH UNIQUE | | 8049K| 92M| 308M| 45750 (9)| 00:09:10 | | 2 | TABLE ACCESS FULL| DIST_TST | 8049K| 92M| | 5918(10)| 00:01:12 |
Note
- dynamic sampling used for this statement
Statistics
49 recursive calls 1 db block gets 47196 consistent gets 11429 physical reads 1680880 redo size 347 bytes sent via SQL*Net to client 246 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10 rows processed
SQL>
SQL> select ename
2 from dist_tst
3 group by ename;
ENAME
NANCY9
NANCY10
NANCY2
NANCY6
NANCY4
NANCY7
NANCY5
NANCY3
NANCY1
NANCY8 10 rows selected.
Elapsed: 00:00:05.79
Execution Plan
Plan hash value: 1228703371
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 8049K| 92M| 9067 (42)| 00:01:49 | | 1 | HASH GROUP BY | | 8049K| 92M| 9067 (42)|00:01:49 |
| 2 | TABLE ACCESS FULL| DIST_TST | 8049K| 92M| 5918 (10)| 00:01:12 |
Note
- dynamic sampling used for this statement
Statistics
4 recursive calls 0 db block gets 23699 consistent gets 11462 physical reads 0 redo size 348 bytes sent via SQL*Net to client 246 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 10 rows processed
SQL> So it may help you, it may not. You'll need to test this on your own data.
David Fitzjarrell Received on Thu Aug 14 2008 - 14:54:50 CDT