Re: Why DISTINCT is costly?

From: Noons <wizofoz2k_at_yahoo.com.au>
Date: 14 Sep 2005 19:35:08 -0700
Message-ID: <1126751708.842284.156850@g43g2000cwa.googlegroups.com>

Robert Klemme wrote:

> So we can leave that out for the moment. Now I'm still wondering where
> exactly the difference is in processing these two selects. It seems there
> is a cost difference attached to SORT GROUP BY vs. SORT UNIQUE but why?
> Is it just that the latter benefits filled caches? Or is there some kind
> of algorithmic difference between SORT GROUP BY and SORT UNIQUE?

There must be some algorithmic difference. Without access to source code, I can't say for sure. But I can test indirectly.

Consider this, for 9.2.0.6:

SQL> create table t1 as select * from all_indexes; Table created.
Elapsed: 00:00:00.25

(now create a few duplicates so we can test this)

SQL> insert into t1 select * from t1;
746 rows created.
Elapsed: 00:00:00.01
SQL> /
1492 rows created.
Elapsed: 00:00:00.02
etcetc...
SQL> /
47744 rows created.
Elapsed: 00:00:00.81
(now put an index there)
SQL> create index i_t1 on
t1(owner,index_name,index_type,table_owner, table_name,table_type,uniqueness,
compression,prefix_length,tablespace_name); Index created.
Elapsed: 00:00:09.96
SQL> analyze table t1 compute statistics; Table analyzed.
Elapsed: 00:00:03.37
SQL> truncate table plan_table;
Table truncated.
Elapsed: 00:00:00.03
SQL> explain plan for
  2 select
  3
owner,index_name,index_type,table_owner,table_name,table_type,uniqueness,   4 compression,prefix_length,tablespace_name   5 from t1
  6 group by
  7
owner,index_name,index_type,table_owner,table_name,table_type,uniqueness,   8 compression,prefix_length,tablespace_name   9 ;
Explained.
Elapsed: 00:00:00.03
SQL> select * from table(dbms_xplan.display); Press Return to continue...
PLAN_TABLE_OUTPUT

|   0 | SELECT STATEMENT     |             | 95488 |  6807K|  1209 |
|   1 |  SORT GROUP BY NOSORT|             | 95488 |  6807K|  1209 |
|   2 |   INDEX FULL SCAN    | I_T1        | 95488 |  6807K|  1209 |
--------------------------------------------------------------------

Note: cpu costing is off
10 rows selected.
Elapsed: 00:00:02.13
SQL> truncate table plan_table;
Table truncated.
Elapsed: 00:00:00.02
SQL> explain plan for
  2 select
  3 distinct
  4
owner,index_name,index_type,table_owner,table_name,table_type,uniqueness,   5 compression,prefix_length,tablespace_name   6 from t1
  7 ;
Explained.
Elapsed: 00:00:00.01
SQL> select * from table(dbms_xplan.display); Press Return to continue...
PLAN_TABLE_OUTPUT

| 0 | SELECT STATEMENT | | 95488 | 6807K| | 1344 |
| 1 | SORT UNIQUE | | 95488 | 6807K| 18M| 1344 |
| 2 | INDEX FAST FULL SCAN| I_T1 | 95488 | 6807K| | 185 |

Note: cpu costing is off
10 rows selected.
Elapsed: 00:00:01.12

There must be some different algorithm at play here. My guess is DISTINCT considers each column at a time whereas group by concatenates all the columns into the equivalent of a raw column and then acts on that. But I have no way of proving that.

--
Cheers
Nuno Souto
in sunny Sydney, Australia
wizofaus_at_gmaildotcomdotnospam

Received on Wed Sep 14 2005 - 21:35:08 CDT