Re: Composite index question

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: Sun, 13 Jan 2008 15:32:34 -0800 (PST)
Message-ID: <777262d8-01e0-4405-a50c-28aceba7c232@c23g2000hsa.googlegroups.com>


On Jan 13, 3:34 pm, nickli2..._at_gmail.com wrote:

> Hi,
>
>   I have a question on how composite index should be used. As an
> example, my queries often select two leading columns, column_1 and
> column_2 of a table. Should I create a composite index using column_1
> and column_2,  or should I create two separated indexes for column_1
> and column_2? What are the differences between these two approaches
> and performance implications?
>
>   Thanks in advance.
>
>   Nick

I don't think that you have provided enough information for a reasonably accurate answer to be provided. There are too many possible exceptions to any type of broad indexing rule for this situation.

Assume the following table exists:
CREATE TABLE T1(

  COLUMN_1 VARCHAR2(10) NOT NULL,
  COLUMN_2 VARCHAR2(10) NOT NULL,
  COLUMN_3 VARCHAR2(20),
  COLUMN_4 VARCHAR2(20),
  COLUMN_5 VARCHAR2(20),
  COLUMN_6 VARCHAR2(20),
  COLUMN_7 VARCHAR2(20),
  COLUMN_8 VARCHAR2(20),
  COLUMN_9 VARCHAR2(20),
  COLUMN_10 VARCHAR2(20));

If your queries typically have a WHERE clause like this: WHERE
  COLUMN_1='my_val'
  AND COLUMN_2='my_val2'

It probably makes sense to create an index on (COLUMN_1,COLUMN_2). The same index could potentially satisfy a WHERE clause that specifies only one of the two columns by using an index skip scan.

---

If your queries typically have a WHERE clause like this: WHERE
  COLUMN_1='my_val'

It probably does not make sense to include COLUMN_2 in the index, as doing so will likely increase the clustering factor of the index, making it appear in cost determination to be a more expensive data access method. Of course, if you are only selecting COLUMN_1 and COLUMN_2, Oracle may only need to visit the index to retrieve the rows, and not visit the table.

The best answer is to test. One such test on Oracle 10.2.0.3: INSERT INTO
  T1
SELECT

  DBMS_RANDOM.STRING('A',10),
  DBMS_RANDOM.STRING('A',5),
  DBMS_RANDOM.STRING('A',15),
  DBMS_RANDOM.STRING('A',15),
  DBMS_RANDOM.STRING('A',15),
  DBMS_RANDOM.STRING('A',15),
  DBMS_RANDOM.STRING('A',15),
  DBMS_RANDOM.STRING('A',15),
  DBMS_RANDOM.STRING('A',15),
  DBMS_RANDOM.STRING('A',15)

FROM
  {a table with at least 100,000 rows}
WHERE
  ROWNUM<=100000;

COMMIT; Now, let's create three indexes, the first of which will be a unique index on the two columns:
CREATE UNIQUE INDEX T1_IND1 ON T1(COLUMN_1,COLUMN_2); CREATE INDEX T1_IND2 ON T1(COLUMN_1);
CREATE INDEX T1_IND3 ON T1(COLUMN_2); Let's make certain that the table and index stats are current: EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE); A simple select on only the two columns: SELECT
  COLUMN_1,
  COLUMN_2
FROM
  T1;

The DBMS_XPLAN:



| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |

| 1 | INDEX FAST FULL SCAN| T1_IND1 | 1 | 100K| 100K| 00:00:00.30 | 1375 |

In the above, Oracle only needed to visit the index, and avoided the visit to the table.

--
SELECT
  COLUMN_1,
  COLUMN_2,
  COLUMN_3
FROM
  T1
WHERE
  COLUMN_1<'BBB';


------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A- Time | Buffers |
------------------------------------------------------------------------------------
|* 1 | TABLE ACCESS FULL| T1 | 1 | 1747 | 1912 | 00:00:00.02 | 2149 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COLUMN_1"<'BBB') In the above, Oracle determined that the best access method is a full table scan, even though there were two indexes on COLUMN_1 that could have been used. -- SELECT COLUMN_1, COLUMN_2, COLUMN_3 FROM T1 WHERE COLUMN_1<'BBB' AND COLUMN_2<'BBB';
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A- Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 31 | 34 |00:00:00.01 | 43 | |* 2 | INDEX RANGE SCAN | T1_IND1 | 1 | 31 | 34 |00:00:00.01 | 9 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("COLUMN_1"<'BBB' AND "COLUMN_2"<'BBB') filter("COLUMN_2"<'BBB') In the above, Oracle determined that the number of rows that would be returned would be very small (0.031% of the rows), so the index based access appeared to be the best option. -- SELECT COLUMN_1, COLUMN_2, COLUMN_3 FROM T1 WHERE COLUMN_2<'BBB';
------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A- Time | Buffers |
------------------------------------------------------------------------------------
|* 1 | TABLE ACCESS FULL| T1 | 1 | 1747 | 2080 | 00:00:00.02 | 2148 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COLUMN_2"<'BBB') In the above, Oracle predicted that 1747 rows (1.75%) of the rows in the table would be returned, and still performed a full table scan, rather than using one of the two available indexes. -- SELECT COLUMN_2, COUNT(*) FROM T1 GROUP BY COLUMN_2 HAVING COUNT(*)>1;
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------
|* 1 | FILTER | | 1 | | 10 | 00:00:00.89 | 244 | | 2 | HASH GROUP BY | | 1 | 5000 | 99990 | 00:00:00.66 | 244 | | 3 | INDEX FAST FULL SCAN| T1_IND3 | 1 | 100K| 100K| 00:00:00.20 | 244 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(COUNT(*)>1) In the above, Oracle was able to return the results by only visiting the index on COLUMN_2. -- Let's repeat the above, this time forcing Oracle to use the composite index: SELECT /*+ INDEX(T1,T1_IND1) */ COLUMN_2, COUNT(*) FROM T1 GROUP BY COLUMN_2 HAVING COUNT(*)>1;
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A- Time | Buffers |
---------------------------------------------------------------------------------------
|* 1 | FILTER | | 1 | | 10 | 00:00:00.90 | 377 | | 2 | HASH GROUP BY | | 1 | 5000 | 99990 | 00:00:00.66 | 377 | | 3 | INDEX FULL SCAN| T1_IND1 | 1 | 100K| 100K| 00:00:00.20 | 377 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(COUNT(*)>1) In this case, Oracle required a little more time to return the result, and was still able to satisfy the query using only the composite index on COLUMN_1,COLUMN_2. -- Let's try another query that cannot be satisfied using just an index, as it includes COLUMN_3 (this query specifies the 10 matching COLUMN_2 values returned by the previous query): SELECT COLUMN_1, COLUMN_2, COLUMN_3 FROM T1 WHERE COLUMN_2 IN ( 'WJTai', 'lADDY', 'XyjEy', 'YbVSm', 'oGbDL', 'vAVOV', 'FBuTH', 'Omkfw', 'mGjHX', 'xywzx');
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A- Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------
| 1 | INLIST ITERATOR | | 1 | | 20 |00:00:00.01 | 40 | | 2 | TABLE ACCESS BY INDEX ROWID| T1 | 10 | 10 | 20 |00:00:00.01 | 40 | |* 3 | INDEX RANGE SCAN | T1_IND3 | 10 | 10 | 20 |00:00:00.01 | 20 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access(("COLUMN_2"='FBuTH' OR "COLUMN_2"='Omkfw' OR "COLUMN_2"='WJTai' OR "COLUMN_2"='XyjEy' OR "COLUMN_2"='YbVSm' OR "COLUMN_2"='lADDY' OR "COLUMN_2"='mGjHX' OR "COLUMN_2"='oGbDL' OR "COLUMN_2"='vAVOV' OR "COLUMN_2"='xywzx')) Oracle quickly returned the 20 matching rows by probing the T1_IND3 index 10 times. -- Let's try the same using the composite index: SELECT /*+ INDEX(T1,T1_IND1) */ COLUMN_1, COLUMN_2, COLUMN_3 FROM T1 WHERE COLUMN_2 IN ( 'WJTai', 'lADDY', 'XyjEy', 'YbVSm', 'oGbDL', 'vAVOV', 'FBuTH', 'Omkfw', 'mGjHX', 'xywzx');
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A- Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 10 | 20 |00:00:00.03 | 397 | |* 2 | INDEX FULL SCAN | T1_IND1 | 1 | 10 | 20 |00:00:00.03 | 377 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("COLUMN_2"='FBuTH' OR "COLUMN_2"='Omkfw' OR "COLUMN_2"='WJTai' OR "COLUMN_2"='XyjEy' OR "COLUMN_2"='YbVSm' OR "COLUMN_2"='lADDY' OR "COLUMN_2"='mGjHX' OR "COLUMN_2"='oGbDL' OR "COLUMN_2"='vAVOV' OR "COLUMN_2"='xywzx')) The above required three times as long to execute, and the index lookup changed from an index range scan to an index full scan. -- Let's try an inline view to restrict the query results to 100 rows: SELECT T1.COLUMN_1, T1.COLUMN_2, T1.COLUMN_3, T1.COLUMN_4 FROM (SELECT COLUMN_1, COLUMN_2, COLUMN_3 FROM T1 WHERE ROWNUM<=100) V_T1, T1 WHERE V_T1.COLUMN_1=T1.COLUMN_1 AND V_T1.COLUMN_2=T1.COLUMN_2;
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A- Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------
| 1 | NESTED LOOPS | | 1 | 100 | 100 |00:00:00.01 | 209 | | 2 | VIEW | | 1 | 100 | 100 |00:00:00.01 | 7 | |* 3 | COUNT STOPKEY | | 1 | | 100 |00:00:00.01 | 7 | | 4 | INDEX FAST FULL SCAN | T1_IND1 | 1 | 100K| 100 |00:00:00.01 | 7 | | 5 | TABLE ACCESS BY INDEX ROWID| T1 | 100 | 1 | 100 |00:00:00.01 | 202 | |* 6 | INDEX UNIQUE SCAN | T1_IND1 | 100 | 1 | 100 |00:00:00.01 | 102 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(ROWNUM<=100) 6 - access("V_T1"."COLUMN_1"="T1"."COLUMN_1" AND "V_T1"."COLUMN_2"="T1"."COLUMN_2") In the above, the inline view was satisfied using an INDEX FAST FULL SCAN on the T1_IND1 index, without visiting the table, even though COLUMN_3 is included in the inline view - a smart optimization by Oracle. -- Let's try again, this time also matching COLUMN_3 from the inline view: SELECT T1.COLUMN_1, T1.COLUMN_2, T1.COLUMN_3, T1.COLUMN_4 FROM (SELECT COLUMN_1, COLUMN_2, COLUMN_3 FROM T1 WHERE ROWNUM<=100) V_T1, T1 WHERE V_T1.COLUMN_1=T1.COLUMN_1 AND V_T1.COLUMN_2=T1.COLUMN_2 AND V_T1.COLUMN_3=T1.COLUMN_3;
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A- Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------
| 1 | NESTED LOOPS | | 1 | 1 | 100 |00:00:00.01 | 208 | | 2 | VIEW | | 1 | 100 | 100 |00:00:00.01 | 6 | |* 3 | COUNT STOPKEY | | 1 | | 100 |00:00:00.01 | 6 | | 4 | TABLE ACCESS FULL | T1 | 1 | 100K| 100 |00:00:00.01 | 6 | |* 5 | TABLE ACCESS BY INDEX ROWID| T1 | 100 | 1 | 100 |00:00:00.01 | 202 | |* 6 | INDEX UNIQUE SCAN | T1_IND1 | 100 | 1 | 100 |00:00:00.01 | 102 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(ROWNUM<=100) 5 - filter("V_T1"."COLUMN_3"="T1"."COLUMN_3") 6 - access("V_T1"."COLUMN_1"="T1"."COLUMN_1" AND "V_T1"."COLUMN_2"="T1"."COLUMN_2") In the above, Oracle determined that the T1_IND1 index alone could not satisfy the inline view, as the value of COLUMN_3 is needed in the WHERE clause - Oracle performed a full table scan (stopping after reading 100 rows) in the inline view, rather than use an index. This query will likely return different rows than the previous query. -- Let's remove the COLUMN_1 line from the WHERE clause: SELECT T1.COLUMN_1, T1.COLUMN_2, T1.COLUMN_3, T1.COLUMN_4 FROM (SELECT COLUMN_1, COLUMN_2, COLUMN_3 FROM T1 WHERE ROWNUM<=100) V_T1, T1 WHERE V_T1.COLUMN_2=T1.COLUMN_2 AND V_T1.COLUMN_3=T1.COLUMN_3;
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A- Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------
|* 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 1 | 100 |00:00:00.01 | 113 | | 2 | NESTED LOOPS | | 1 | 100 | 201 |00:00:00.01 | 110 | | 3 | VIEW | | 1 | 100 | 100 |00:00:00.01 | 6 | |* 4 | COUNT STOPKEY | | 1 | | 100 |00:00:00.01 | 6 | | 5 | TABLE ACCESS FULL | T1 | 1 | 100K| 100 |00:00:00.01 | 6 | |* 6 | INDEX RANGE SCAN | T1_IND3 | 100 | 1 | 100 |00:00:00.01 | 104 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("V_T1"."COLUMN_3"="T1"."COLUMN_3") 4 - filter(ROWNUM<=100) 6 - access("V_T1"."COLUMN_2"="T1"."COLUMN_2") In the above, Oracle performed a full table scan (stopping after reading 100 rows) in the inline view, and this time used the index on COLUMN_2 to find the matching rows in T1. A look at a 10053 trace file for your queries, as well as experimentation with DBMS_XPLAN might help you make the right decision, rather than relying on broad rules of thumb. Charles Hooper IT Manager/Oracle DBA K&M Machine-Fabricating, Inc.
Received on Sun Jan 13 2008 - 17:32:34 CST

Original text of this message