Re: Composite index question

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: Sun, 13 Jan 2008 15:32:34 -0800 (PST)
Message-ID: <777262d8-01e0-4405-a50c-28aceba7c232@c23g2000hsa.googlegroups.com>

On Jan 13, 3:34�pm, nickli2..._at_gmail.com wrote:

> Hi,
>
> � I have a question on how composite index should be used. As an
> example, my queries often select two leading columns, column_1 and
> column_2 of a table. Should I create a composite index using column_1
> and column_2, �or should I create two separated indexes for column_1
> and column_2? What are the differences between these two approaches
> and performance implications?
>
> � Thanks in advance.
>
> � Nick

I don't think that you have provided enough information for a reasonably accurate answer to be provided. There are too many possible exceptions to any type of broad indexing rule for this situation.

Assume the following table exists:
CREATE TABLE T1(

  COLUMN_1 VARCHAR2(10) NOT NULL,
  COLUMN_2 VARCHAR2(10) NOT NULL,
  COLUMN_3 VARCHAR2(20),
  COLUMN_4 VARCHAR2(20),
  COLUMN_5 VARCHAR2(20),
  COLUMN_6 VARCHAR2(20),
  COLUMN_7 VARCHAR2(20),
  COLUMN_8 VARCHAR2(20),
  COLUMN_9 VARCHAR2(20),
  COLUMN_10 VARCHAR2(20));

If your queries typically have a WHERE clause like this: WHERE
COLUMN_1='my_val'
AND COLUMN_2='my_val2'

It probably makes sense to create an index on (COLUMN_1,COLUMN_2). The same index could potentially satisfy a WHERE clause that specifies only one of the two columns by using an index skip scan.

---

If your queries typically have a WHERE clause like this: WHERE
COLUMN_1='my_val'

It probably does not make sense to include COLUMN_2 in the index, as doing so will likely increase the clustering factor of the index, making it appear in cost determination to be a more expensive data access method. Of course, if you are only selecting COLUMN_1 and COLUMN_2, Oracle may only need to visit the index to retrieve the rows, and not visit the table.

The best answer is to test. One such test on Oracle 10.2.0.3: INSERT INTO
T1
SELECT

  DBMS_RANDOM.STRING('A',10),
  DBMS_RANDOM.STRING('A',5),
  DBMS_RANDOM.STRING('A',15),
  DBMS_RANDOM.STRING('A',15),
  DBMS_RANDOM.STRING('A',15),
  DBMS_RANDOM.STRING('A',15),
  DBMS_RANDOM.STRING('A',15),
  DBMS_RANDOM.STRING('A',15),
  DBMS_RANDOM.STRING('A',15),
  DBMS_RANDOM.STRING('A',15)

FROM
{a table with at least 100,000 rows}
WHERE
ROWNUM<=100000;

COMMIT; Now, let's create three indexes, the first of which will be a unique index on the two columns:
CREATE UNIQUE INDEX T1_IND1 ON T1(COLUMN_1,COLUMN_2); CREATE INDEX T1_IND2 ON T1(COLUMN_1);
CREATE INDEX T1_IND3 ON T1(COLUMN_2); Let's make certain that the table and index stats are current: EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE); A simple select on only the two columns: SELECT
  COLUMN_1,
  COLUMN_2
FROM
  T1;

The DBMS_XPLAN:

| 1 | INDEX FAST FULL SCAN| T1_IND1 | 1 | 100K| 100K| 00:00:00.30 | 1375 |

In the above, Oracle only needed to visit the index, and avoided the visit to the table.

--
SELECT
  COLUMN_1,
  COLUMN_2,
  COLUMN_3
FROM
  T1
WHERE
  COLUMN_1<'BBB';



------------------------------------------------------------------------------------

| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-
Time   | Buffers |


------------------------------------------------------------------------------------

|*  1 |  TABLE ACCESS FULL| T1   |      1 |   1747 |   1912 |
00:00:00.02 |    2149 |


------------------------------------------------------------------------------------


Predicate Information (identified by operation id):


---------------------------------------------------

   1 - filter("COLUMN_1"<'BBB')

In the above, Oracle determined that the best access method is a full
table scan, even though there were two indexes on COLUMN_1 that could
have been used.

--
SELECT
  COLUMN_1,
  COLUMN_2,
  COLUMN_3
FROM
  T1
WHERE
  COLUMN_1<'BBB'
  AND COLUMN_2<'BBB';



-------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name    | Starts | E-Rows | A-
Rows |   A-Time   | Buffers |


-------------------------------------------------------------------------------------------------

|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |      1 |     31 |
34 |00:00:00.01 |      43 |
|*  2 |   INDEX RANGE SCAN          | T1_IND1 |      1 |     31 |
34 |00:00:00.01 |       9 |


-------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):


---------------------------------------------------

   2 - access("COLUMN_1"<'BBB' AND "COLUMN_2"<'BBB')
       filter("COLUMN_2"<'BBB')

In the above, Oracle determined that the number of rows that would be
returned would be very small (0.031% of the rows), so the index based
access appeared to be the best option.

--
SELECT
  COLUMN_1,
  COLUMN_2,
  COLUMN_3
FROM
  T1
WHERE
  COLUMN_2<'BBB';



------------------------------------------------------------------------------------

| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-
Time   | Buffers |


------------------------------------------------------------------------------------

|*  1 |  TABLE ACCESS FULL| T1   |      1 |   1747 |   2080 |
00:00:00.02 |    2148 |


------------------------------------------------------------------------------------


Predicate Information (identified by operation id):


---------------------------------------------------

   1 - filter("COLUMN_2"<'BBB')

In the above, Oracle predicted that 1747 rows (1.75%) of the rows in
the table would be returned, and still performed a full table scan,
rather than using one of the two available indexes.

--
SELECT
  COLUMN_2,
  COUNT(*)
FROM
  T1
GROUP BY
  COLUMN_2
HAVING
  COUNT(*)>1;



--------------------------------------------------------------------------------------------

| Id  | Operation              | Name    | Starts | E-Rows | A-Rows
|   A-Time   | Buffers |


--------------------------------------------------------------------------------------------

|*  1 |  FILTER                |         |      1 |        |     10 |
00:00:00.89 |     244 |
|   2 |   HASH GROUP BY        |         |      1 |   5000 |  99990 |
00:00:00.66 |     244 |
|   3 |    INDEX FAST FULL SCAN| T1_IND3 |      1 |    100K|    100K|
00:00:00.20 |     244 |


--------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):


---------------------------------------------------

   1 - filter(COUNT(*)>1)

In the above, Oracle was able to return the results by only visiting
the index on COLUMN_2.

--
Let's repeat the above, this time forcing Oracle to use the composite
index:
SELECT /*+ INDEX(T1,T1_IND1) */
  COLUMN_2,
  COUNT(*)
FROM
  T1
GROUP BY
  COLUMN_2
HAVING
  COUNT(*)>1;



---------------------------------------------------------------------------------------

| Id  | Operation         | Name    | Starts | E-Rows | A-Rows |   A-
Time   | Buffers |


---------------------------------------------------------------------------------------

|*  1 |  FILTER           |         |      1 |        |     10 |
00:00:00.90 |     377 |
|   2 |   HASH GROUP BY   |         |      1 |   5000 |  99990 |
00:00:00.66 |     377 |
|   3 |    INDEX FULL SCAN| T1_IND1 |      1 |    100K|    100K|
00:00:00.20 |     377 |


---------------------------------------------------------------------------------------


Predicate Information (identified by operation id):


---------------------------------------------------

   1 - filter(COUNT(*)>1)

In this case, Oracle required a little more time to return the result,
and was still able to satisfy the query using only the composite index
on COLUMN_1,COLUMN_2.

--
Let's try another query that cannot be satisfied using just an index,
as it includes COLUMN_3 (this query specifies the 10 matching COLUMN_2
values returned by the previous query):
SELECT
  COLUMN_1,
  COLUMN_2,
  COLUMN_3
FROM
  T1
WHERE
  COLUMN_2 IN (
    'WJTai',
    'lADDY',
    'XyjEy',
    'YbVSm',
    'oGbDL',
    'vAVOV',
    'FBuTH',
    'Omkfw',
    'mGjHX',
    'xywzx');



--------------------------------------------------------------------------------------------------

| Id  | Operation                    | Name    | Starts | E-Rows | A-
Rows |   A-Time   | Buffers |


--------------------------------------------------------------------------------------------------

|   1 |  INLIST ITERATOR             |         |      1 |        |
20 |00:00:00.01 |      40 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1      |     10 |     10 |
20 |00:00:00.01 |      40 |
|*  3 |    INDEX RANGE SCAN          | T1_IND3 |     10 |     10 |
20 |00:00:00.01 |      20 |


--------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):


---------------------------------------------------

   3 - access(("COLUMN_2"='FBuTH' OR "COLUMN_2"='Omkfw' OR
"COLUMN_2"='WJTai' OR
              "COLUMN_2"='XyjEy' OR "COLUMN_2"='YbVSm' OR
"COLUMN_2"='lADDY' OR "COLUMN_2"='mGjHX' OR
              "COLUMN_2"='oGbDL' OR "COLUMN_2"='vAVOV' OR
"COLUMN_2"='xywzx'))

Oracle quickly returned the 20 matching rows by probing the T1_IND3
index 10 times.

--
Let's try the same using the composite index:
SELECT /*+ INDEX(T1,T1_IND1) */
  COLUMN_1,
  COLUMN_2,
  COLUMN_3
FROM
  T1
WHERE
  COLUMN_2 IN (
    'WJTai',
    'lADDY',
    'XyjEy',
    'YbVSm',
    'oGbDL',
    'vAVOV',
    'FBuTH',
    'Omkfw',
    'mGjHX',
    'xywzx');



-------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name    | Starts | E-Rows | A-
Rows |   A-Time   | Buffers |


-------------------------------------------------------------------------------------------------

|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |      1 |     10 |
20 |00:00:00.03 |     397 |
|*  2 |   INDEX FULL SCAN           | T1_IND1 |      1 |     10 |
20 |00:00:00.03 |     377 |


-------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):


---------------------------------------------------

   2 - filter(("COLUMN_2"='FBuTH' OR "COLUMN_2"='Omkfw' OR
"COLUMN_2"='WJTai' OR
              "COLUMN_2"='XyjEy' OR "COLUMN_2"='YbVSm' OR
"COLUMN_2"='lADDY' OR "COLUMN_2"='mGjHX' OR
              "COLUMN_2"='oGbDL' OR "COLUMN_2"='vAVOV' OR
"COLUMN_2"='xywzx'))

The above required three times as long to execute, and the index
lookup changed from an index range scan to an index full scan.

--
Let's try an inline view to restrict the query results to 100 rows:
SELECT
  T1.COLUMN_1,
  T1.COLUMN_2,
  T1.COLUMN_3,
  T1.COLUMN_4
FROM
  (SELECT
    COLUMN_1,
    COLUMN_2,
    COLUMN_3
  FROM
    T1
  WHERE
    ROWNUM<=100) V_T1,
  T1
WHERE
  V_T1.COLUMN_1=T1.COLUMN_1
  AND V_T1.COLUMN_2=T1.COLUMN_2;



--------------------------------------------------------------------------------------------------

| Id  | Operation                    | Name    | Starts | E-Rows | A-
Rows |   A-Time   | Buffers |


--------------------------------------------------------------------------------------------------

|   1 |  NESTED LOOPS                |         |      1 |    100 |
100 |00:00:00.01 |     209 |
|   2 |   VIEW                       |         |      1 |    100 |
100 |00:00:00.01 |       7 |
|*  3 |    COUNT STOPKEY             |         |      1 |        |
100 |00:00:00.01 |       7 |
|   4 |     INDEX FAST FULL SCAN     | T1_IND1 |      1 |    100K|
100 |00:00:00.01 |       7 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1      |    100 |      1 |
100 |00:00:00.01 |     202 |
|*  6 |    INDEX UNIQUE SCAN         | T1_IND1 |    100 |      1 |
100 |00:00:00.01 |     102 |


--------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):


---------------------------------------------------

   3 - filter(ROWNUM<=100)
   6 - access("V_T1"."COLUMN_1"="T1"."COLUMN_1" AND
"V_T1"."COLUMN_2"="T1"."COLUMN_2")

In the above, the inline view was satisfied using an INDEX FAST FULL
SCAN on the T1_IND1 index, without visiting the table, even though
COLUMN_3 is included in the inline view - a smart optimization by
Oracle.

--
Let's try again, this time also matching COLUMN_3 from the inline
view:
SELECT
  T1.COLUMN_1,
  T1.COLUMN_2,
  T1.COLUMN_3,
  T1.COLUMN_4
FROM
  (SELECT
    COLUMN_1,
    COLUMN_2,
    COLUMN_3
  FROM
    T1
  WHERE
    ROWNUM<=100) V_T1,
  T1
WHERE
  V_T1.COLUMN_1=T1.COLUMN_1
  AND V_T1.COLUMN_2=T1.COLUMN_2
  AND V_T1.COLUMN_3=T1.COLUMN_3;



--------------------------------------------------------------------------------------------------

| Id  | Operation                    | Name    | Starts | E-Rows | A-
Rows |   A-Time   | Buffers |


--------------------------------------------------------------------------------------------------

|   1 |  NESTED LOOPS                |         |      1 |      1 |
100 |00:00:00.01 |     208 |
|   2 |   VIEW                       |         |      1 |    100 |
100 |00:00:00.01 |       6 |
|*  3 |    COUNT STOPKEY             |         |      1 |        |
100 |00:00:00.01 |       6 |
|   4 |     TABLE ACCESS FULL        | T1      |      1 |    100K|
100 |00:00:00.01 |       6 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| T1      |    100 |      1 |
100 |00:00:00.01 |     202 |
|*  6 |    INDEX UNIQUE SCAN         | T1_IND1 |    100 |      1 |
100 |00:00:00.01 |     102 |


--------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):


---------------------------------------------------

   3 - filter(ROWNUM<=100)
   5 - filter("V_T1"."COLUMN_3"="T1"."COLUMN_3")
   6 - access("V_T1"."COLUMN_1"="T1"."COLUMN_1" AND
"V_T1"."COLUMN_2"="T1"."COLUMN_2")

In the above, Oracle determined that the T1_IND1 index alone could not
satisfy the inline view, as the value of COLUMN_3 is needed in the
WHERE clause - Oracle performed a full table scan (stopping after
reading 100 rows) in the inline view, rather than use an index.  This
query will likely return different rows than the previous query.

--
Let's remove the COLUMN_1 line from the WHERE clause:
SELECT
  T1.COLUMN_1,
  T1.COLUMN_2,
  T1.COLUMN_3,
  T1.COLUMN_4
FROM
  (SELECT
    COLUMN_1,
    COLUMN_2,
    COLUMN_3
  FROM
    T1
  WHERE
    ROWNUM<=100) V_T1,
  T1
WHERE
  V_T1.COLUMN_2=T1.COLUMN_2
  AND V_T1.COLUMN_3=T1.COLUMN_3;



-------------------------------------------------------------------------------------------------

| Id  | Operation                   | Name    | Starts | E-Rows | A-
Rows |   A-Time   | Buffers |


-------------------------------------------------------------------------------------------------

|*  1 |  TABLE ACCESS BY INDEX ROWID| T1      |      1 |      1 |
100 |00:00:00.01 |     113 |
|   2 |   NESTED LOOPS              |         |      1 |    100 |
201 |00:00:00.01 |     110 |
|   3 |    VIEW                     |         |      1 |    100 |
100 |00:00:00.01 |       6 |
|*  4 |     COUNT STOPKEY           |         |      1 |        |
100 |00:00:00.01 |       6 |
|   5 |      TABLE ACCESS FULL      | T1      |      1 |    100K|
100 |00:00:00.01 |       6 |
|*  6 |    INDEX RANGE SCAN         | T1_IND3 |    100 |      1 |
100 |00:00:00.01 |     104 |


-------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):


---------------------------------------------------

   1 - filter("V_T1"."COLUMN_3"="T1"."COLUMN_3")
   4 - filter(ROWNUM<=100)
   6 - access("V_T1"."COLUMN_2"="T1"."COLUMN_2")

In the above, Oracle performed a full table scan (stopping after
reading 100 rows) in the inline view, and this time used the index on
COLUMN_2 to find the matching rows in T1.

A look at a 10053 trace file for your queries, as well as
experimentation with DBMS_XPLAN might help you make the right
decision, rather than relying on broad rules of thumb.

Charles Hooper
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc.

Received on Sun Jan 13 2008 - 17:32:34 CST

This message: [ Message body ]
Next message: Andreas Thiele: "Re: Installation of Oracle with unixODBC"
Previous message: Mark D Powell: "Re: Composite index question"
In reply to: nickli2000_at_gmail.com: "Composite index question"
Next in thread: Charles Hooper: "Re: Composite index question"
Reply: Charles Hooper: "Re: Composite index question"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message