Re: Help with sql query speed. Explain plan included

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: 19 Mar 2007 11:45:42 -0700
Message-ID: <1174329942.458784.307930@b75g2000hsg.googlegroups.com>

On Mar 18, 11:11 pm, "fitzjarr..._at_cox.net" <fitzjarr..._at_cox.net> wrote:
> On Mar 18, 8:36 pm, "Charles Hooper" <hooperc2..._at_yahoo.com> wrote:
> > The OP is accessing the same table twice in the query with two
> > aliases: "test1 t0,test1 tx". The t0 alias is being accessed by the
> > RESIDENTID column: "t0.residentid=1486674". To access those rows, and
> > index on just the RESIDENTID should be helpful. I think that we both
> > agree that such an index will not help this portion of the WHERE
> > clause "t1.residentid!=1486674" The one index on the table (IDXE) is
> > a composite index covering all three columns. What I was suggesting
> > is to test performance with this one three column index split into two
> > indexes. The two column composite index will hopefully be able to
> > satisfy the "t0.questionnumber=tx.questionnumber and
> > t0.answer=tx.answer" portion of the query without requiring a full
> > tablescan of table TEST1.
>
> > Thinking about it a bit more, I think that I understand what you are
> > saying - the number of blocks in the two column index may be the same
> > or greater than that of the three column table. His query would have
> > to check ((3,000,000 - 80) / (3,000,000) * 100)% of the rows in the
> > table, and that looks to be quite a bit greater than 15% to 25%. It
> > could be interesting to see if Oracle could optimize this
> > situation... if each question has an even distribution between the
> > 100 possible answers.
>
> > Charles Hooper
> > PC Support Specialist
> > K&M Machine-Fabricating, Inc.
> The distribution is not even, nor is it guaranteed to be complete
> within the range from 1 to 100 for any given question, and this was
> stated at the outset by the OP:
>
> 4. There are 80 questions on each census form.
> 5. Each question on the census form has an answer that is between 1
> and 100.
> 6. The country residents do not have to answer every question.
> 7. If they do not answer a question, a row in the table is not
> created.
>
> I do think the composite index would provide some needed help to this
> query, and the index on QUESTIONNUMBER and ANSWER should be
> implemented. I also agree that the lone index on this table should be
> broken into two separate pieces. My thoughts are that the single-
> column index on RESIDENTID, although present, would likely not be used
> to eliminate the table scan, however I do believe the two-column index
> would reduce the number of returned rows and thus improve the overall
> speed of the query.
>
> David Fitzjarrell>
>

I just completed some rather extensive testing of this problem on Oracle 10.2.0.3 running on Windows Vista. The results were not quite was I expected. 10.2.0.3, at least on this platform, seems to ignore hints.

CREATE TABLE T1 (
  RESIDENTID NUMBER(12),
  QUESTIONNUMBER NUMBER(12),
  ANSWER NUMBER(12)); Add the first 1,000,000 responses (replace LABOR_TICKET with any table that has at least a million rows):
INSERT INTO
  T1
SELECT
  C1.COUNTER1,
  C2.COUNTER2,
  TRUNC(DBMS_RANDOM.VALUE(-5,101))
FROM
  (SELECT
    ROWNUM COUNTER1
  FROM
    LABOR_TICKET
  WHERE
    ROWNUM<=1000000) C1,
  (SELECT
    ROWNUM COUNTER2
  FROM
    DUAL
  CONNECT BY
    LEVEL<=80) C2;

80000000 rows created.

Add the second 1,000,000 responses:
INSERT INTO
  T1
SELECT
  C1.COUNTER1+1000000,
  C2.COUNTER2,
  TRUNC(DBMS_RANDOM.VALUE(-5,101))
FROM
  (SELECT
    ROWNUM COUNTER1
  FROM
    LABOR_TICKET
  WHERE
    ROWNUM<=1000000) C1,
  (SELECT
    ROWNUM COUNTER2
  FROM
    LABOR_TICKET
  WHERE
    ROWNUM<=80) C2;

80000000 rows created.

Add the third 1,000,000 responses:
INSERT INTO
  T1
SELECT
  C1.COUNTER1+2000000,
  C2.COUNTER2,
  TRUNC(DBMS_RANDOM.VALUE(-5,101))
FROM
  (SELECT
    ROWNUM COUNTER1
  FROM
    LABOR_TICKET
  WHERE
    ROWNUM<=1000000) C1,
  (SELECT
    ROWNUM COUNTER2
  FROM
    LABOR_TICKET
  WHERE
    ROWNUM<=80) C2;

80000000 rows created.

Remove a random number of questions from each person - those with responses less than 1:
DELETE FROM
T1
WHERE
ANSWER<=0;

13592343 rows deleted.

COMMIT; Set up a large SORT_AREA_SIZE:
ALTER SYSTEM SET SORT_AREA_SIZE=20971520 DEFERRED; ALTER SESSION SET SORT_AREA_SIZE=20971520; Create the original index:
CREATE UNIQUE INDEX T1_IDXE ON T1(RESIDENTID,QUESTIONNUMBER,ANSWER); EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TABLE_OWNER',TABNAME=>'T1',CASCADE=>TRUE); ALTER SYSTEM FLUSH SHARED_POOL; Set up a 10053 trace, Vista seems to have problems returning actuals using DBMS_STATS:
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST10'; ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1'; select tx.residentid
from t1 t0,t1 tx
  where t0.residentid!=tx.residentid
   and t0.questionnumber=tx.questionnumber    and t0.answer=tx.answer
   and t0.residentid=1486674
    group by tx.residentid
     having count(*)>10;

SELECT
*
FROM
TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS'));

Plan Table

+-----------------------------------+
| Id  | Operation             | Name    | Rows  | Bytes | Cost  |
Time      |
----------------------------------------
+-----------------------------------+
| 0   | SELECT STATEMENT      |         |       |       |  217K
|           |
| 1   |  FILTER               |         |       |       |
|           |
| 2   |   SORT GROUP BY       |         |  102K | 2246K |  217K |
00:57:46 |
| 3   |    HASH JOIN          |         | 2041K |   44M |  210K |
00:55:43 |
| 4   |     INDEX RANGE SCAN  | T1_IDXE |    76 |   836 |     4 |
00:00:01 |
| 5   |     TABLE ACCESS FULL | T1      |  216M | 2376M |  208K |

00:54:13 |

+-----------------------------------+

Predicate Information:

1 - filter(COUNT(*)>10)

3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND
"T0"."ANSWER"="TX"."ANSWER")
3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID")
4 - access("T0"."RESIDENTID"=1486674)
5 - filter("TX"."RESIDENTID"<>1486674)

Try the other query that was suggested in this group: SELECT residentId, count(1)
FROM t1
WHERE ( questionNumber, answer ) IN ( SELECT questionNumber, answer

                                         FROM t1
                                        WHERE residentId = 1486674 )

GROUP BY residentId
HAVING COUNT(1) > 10; SELECT
*
FROM
TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS'));

Plan Table

+-----------------------------------+
| Id  | Operation               | Name    | Rows  | Bytes | Cost  |
Time      |
------------------------------------------
+-----------------------------------+
| 0   | SELECT STATEMENT        |         |       |       |  208K
|           |
| 1   |  FILTER                 |         |       |       |
|           |
| 2   |   SORT GROUP BY         |         |  1384 |   30K |  208K |
00:54:11 |
| 3   |    HASH JOIN RIGHT SEMI |         |   27K |  595K |  208K |
00:54:11 |
| 4   |     INDEX RANGE SCAN    | T1_IDXE |    76 |   836 |     4 |
00:00:01 |
| 5   |     TABLE ACCESS FULL   | T1      |  216M | 2376M |  206K |

00:54:40 |

+-----------------------------------+

Predicate Information:

1 - filter(COUNT(*)>10)
3 - access("QUESTIONNUMBER"="QUESTIONNUMBER" AND "ANSWER"="ANSWER")
4 - access("RESIDENTID"=1486674)

Make some small changes to the table definition: ALTER TABLE T1 MODIFY (
  QUESTIONNUMBER NUMBER(12) NOT NULL,
  ANSWER NUMBER(12) NOT NULL); Create the two indexes that I suggested: CREATE INDEX T1_INDEX1 ON T1(RESIDENTID); CREATE INDEX T1_INDEX2 ON T1(QUESTIONNUMBER,ANSWER); EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TABLE_OWNER',TABNAME=>'T1',CASCADE=>TRUE); ALTER SYSTEM FLUSH SHARED_POOL; ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST100'; ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1'; select tx.residentid
from t1 t0,t1 tx
  where t0.residentid!=tx.residentid
   and t0.questionnumber=tx.questionnumber    and t0.answer=tx.answer
   and t0.residentid=1486674
    group by tx.residentid
     having count(*)>10;

Plan Table

+-----------------------------------+
| Id  | Operation             | Name    | Rows  | Bytes | Cost  |
Time      |
----------------------------------------
+-----------------------------------+
| 0   | SELECT STATEMENT      |         |       |       |  217K
|           |
| 1   |  FILTER               |         |       |       |
|           |
| 2   |   SORT GROUP BY       |         |  102K | 2245K |  217K |
00:57:46 |
| 3   |    HASH JOIN          |         | 2041K |   44M |  210K |
00:55:43 |
| 4   |     INDEX RANGE SCAN  | T1_IDXE |    76 |   836 |     4 |
00:00:01 |
| 5   |     TABLE ACCESS FULL | T1      |  216M | 2376M |  208K |

00:54:13 |

+-----------------------------------+

Predicate Information:

1 - filter(COUNT(*)>10)

3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND
"T0"."ANSWER"="TX"."ANSWER")
3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID")
4 - access("T0"."RESIDENTID"=1486674)
5 - filter("TX"."RESIDENTID"<>1486674)

Plan stayed the same - clustering factor was lower for the T1_IDEX index.

Provide a hint:
select tx.residentid /*+ INDEX(T0 T1_INDEX1) INDEX(TX T1_INDEX2) */ from t1 t0,t1 tx
  where t0.residentid!=tx.residentid
   and t0.questionnumber=tx.questionnumber    and t0.answer=tx.answer
   and t0.residentid=1486674
    group by tx.residentid
     having count(*)>10

Plan Table

+-----------------------------------+
| Id  | Operation             | Name    | Rows  | Bytes | Cost  |
Time      |
----------------------------------------
+-----------------------------------+
| 0   | SELECT STATEMENT      |         |       |       |  217K
|           |
| 1   |  FILTER               |         |       |       |
|           |
| 2   |   SORT GROUP BY       |         |  102K | 2245K |  217K |
00:57:46 |
| 3   |    HASH JOIN          |         | 2041K |   44M |  210K |
00:55:43 |
| 4   |     INDEX RANGE SCAN  | T1_IDXE |    76 |   836 |     4 |
00:00:01 |
| 5   |     TABLE ACCESS FULL | T1      |  216M | 2376M |  208K |

00:54:13 |

+-----------------------------------+

Predicate Information:

1 - filter(COUNT(*)>10)

3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND
"T0"."ANSWER"="TX"."ANSWER")
3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID")
4 - access("T0"."RESIDENTID"=1486674)
5 - filter("TX"."RESIDENTID"<>1486674)

Plan still using the original index.

DROP INDEX T1_IDXE; select tx.residentid /*+ INDEX(t0 T1_INDEX1) INDEX(tx T1_INDEX2) */ from t1 t0,t1 tx
  where t0.residentid!=tx.residentid
   and t0.questionnumber=tx.questionnumber    and t0.answer=tx.answer
   and t0.residentid=1486674
    group by tx.residentid
     having count(*)>10;

Plan Table

+-----------------------------------+
| Id  | Operation                       | Name     | Rows  | Bytes |
Cost  | Time      |
---------------------------------------------------
+-----------------------------------+
| 0   | SELECT STATEMENT                |          |       |       |
218K |           |
| 1   |  FILTER                         |          |       |
|       |           |
| 2   |   SORT GROUP BY                 |          |  102K | 2245K |
218K |  00:57:47 |
| 3   |    HASH JOIN                    |          | 2041K |   44M |
210K |  00:55:45 |
| 4   |     TABLE ACCESS BY INDEX ROWID | T1       |    76 |   836
|    80 |  00:00:02 |
| 5   |      INDEX RANGE SCAN           | T1_INDEX1|    76 |
|     4 |  00:00:01 |
| 6   |     TABLE ACCESS FULL           | T1       |  216M | 2376M |

208K | 00:54:13 |

+-----------------------------------+

Predicate Information:

1 - filter(COUNT(*)>10)

3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND
"T0"."ANSWER"="TX"."ANSWER")
3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID")
5 - access("T0"."RESIDENTID"=1486674)
6 - filter("TX"."RESIDENTID"<>1486674)

Try the other query that was suggested:
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST101'; SELECT residentId, count(1)
FROM t1
WHERE ( questionNumber, answer ) IN ( SELECT questionNumber, answer

                                         FROM t1
                                        WHERE residentId = 1486674 )

GROUP BY residentId
HAVING COUNT(1) > 10; SELECT
*
FROM
TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS')); Plan Table

+-----------------------------------+
| Id  | Operation                       | Name     | Rows  | Bytes |
Cost  | Time      |
---------------------------------------------------
+-----------------------------------+
| 0   | SELECT STATEMENT                |          |       |       |
208K |           |
| 1   |  FILTER                         |          |       |
|       |           |
| 2   |   SORT GROUP BY                 |          |  1384 |   30K |
208K |  00:54:12 |
| 3   |    HASH JOIN RIGHT SEMI         |          |   27K |  595K |
208K |  00:54:12 |
| 4   |     TABLE ACCESS BY INDEX ROWID | T1       |    76 |   836
|    80 |  00:00:02 |
| 5   |      INDEX RANGE SCAN           | T1_INDEX1|    76 |
|     4 |  00:00:01 |
| 6   |     TABLE ACCESS FULL           | T1       |  216M | 2376M |

206K | 00:54:40 |

+-----------------------------------+

Predicate Information:

1 - filter(COUNT(*)>10)
3 - access("QUESTIONNUMBER"="QUESTIONNUMBER" AND "ANSWER"="ANSWER")
5 - access("RESIDENTID"=1486674)

For some reason, this query returned one row where none were returned before.

At this point, I am not sure what the solution may be. Your results may vary on other platforms.

Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc. Received on Mon Mar 19 2007 - 13:45:42 CDT