Re: Help with sql query speed. Explain plan included

From: Valentin Minzatu <valentinminzatu_at_yahoo.com>
Date: 19 Mar 2007 12:48:49 -0700
Message-ID: <1174333729.644046.32200@e65g2000hsc.googlegroups.com>

On Mar 19, 2:45 pm, "Charles Hooper" <hooperc2..._at_yahoo.com> wrote:
> On Mar 18, 11:11 pm, "fitzjarr..._at_cox.net" <fitzjarr..._at_cox.net>
> wrote:
>
> > On Mar 18, 8:36 pm, "Charles Hooper" <hooperc2..._at_yahoo.com> wrote:
> > > The OP is accessing the same table twice in the query with two
> > > aliases: "test1 t0,test1 tx". The t0 alias is being accessed by the
> > > RESIDENTID column: "t0.residentid=1486674". To access those rows, and
> > > index on just the RESIDENTID should be helpful. I think that we both
> > > agree that such an index will not help this portion of the WHERE
> > > clause "t1.residentid!=1486674" The one index on the table (IDXE) is
> > > a composite index covering all three columns. What I was suggesting
> > > is to test performance with this one three column index split into two
> > > indexes. The two column composite index will hopefully be able to
> > > satisfy the "t0.questionnumber=tx.questionnumber and
> > > t0.answer=tx.answer" portion of the query without requiring a full
> > > tablescan of table TEST1.
>
> > > Thinking about it a bit more, I think that I understand what you are
> > > saying - the number of blocks in the two column index may be the same
> > > or greater than that of the three column table. His query would have
> > > to check ((3,000,000 - 80) / (3,000,000) * 100)% of the rows in the
> > > table, and that looks to be quite a bit greater than 15% to 25%. It
> > > could be interesting to see if Oracle could optimize this
> > > situation... if each question has an even distribution between the
> > > 100 possible answers.
>
> > > Charles Hooper
> > > PC Support Specialist
> > > K&M Machine-Fabricating, Inc.
> > The distribution is not even, nor is it guaranteed to be complete
> > within the range from 1 to 100 for any given question, and this was
> > stated at the outset by the OP:
>
> > 4. There are 80 questions on each census form.
> > 5. Each question on the census form has an answer that is between 1
> > and 100.
> > 6. The country residents do not have to answer every question.
> > 7. If they do not answer a question, a row in the table is not
> > created.
>
> > I do think the composite index would provide some needed help to this
> > query, and the index on QUESTIONNUMBER and ANSWER should be
> > implemented. I also agree that the lone index on this table should be
> > broken into two separate pieces. My thoughts are that the single-
> > column index on RESIDENTID, although present, would likely not be used
> > to eliminate the table scan, however I do believe the two-column index
> > would reduce the number of returned rows and thus improve the overall
> > speed of the query.
>
> > David Fitzjarrell>
>
> I just completed some rather extensive testing of this problem on
> Oracle 10.2.0.3 running on Windows Vista. The results were not quite
> was I expected. 10.2.0.3, at least on this platform, seems to ignore
> hints.
>
> CREATE TABLE T1 (
> RESIDENTID NUMBER(12),
> QUESTIONNUMBER NUMBER(12),
> ANSWER NUMBER(12));
>
> Add the first 1,000,000 responses (replace LABOR_TICKET with any table
> that has at least a million rows):
> INSERT INTO
> T1
> SELECT
> C1.COUNTER1,
> C2.COUNTER2,
> TRUNC(DBMS_RANDOM.VALUE(-5,101))
> FROM
> (SELECT
> ROWNUM COUNTER1
> FROM
> LABOR_TICKET
> WHERE
> ROWNUM<=1000000) C1,
> (SELECT
> ROWNUM COUNTER2
> FROM
> DUAL
> CONNECT BY
> LEVEL<=80) C2;
>
> 80000000 rows created.
>
> Add the second 1,000,000 responses:
> INSERT INTO
> T1
> SELECT
> C1.COUNTER1+1000000,
> C2.COUNTER2,
> TRUNC(DBMS_RANDOM.VALUE(-5,101))
> FROM
> (SELECT
> ROWNUM COUNTER1
> FROM
> LABOR_TICKET
> WHERE
> ROWNUM<=1000000) C1,
> (SELECT
> ROWNUM COUNTER2
> FROM
> LABOR_TICKET
> WHERE
> ROWNUM<=80) C2;
>
> 80000000 rows created.
>
> Add the third 1,000,000 responses:
> INSERT INTO
> T1
> SELECT
> C1.COUNTER1+2000000,
> C2.COUNTER2,
> TRUNC(DBMS_RANDOM.VALUE(-5,101))
> FROM
> (SELECT
> ROWNUM COUNTER1
> FROM
> LABOR_TICKET
> WHERE
> ROWNUM<=1000000) C1,
> (SELECT
> ROWNUM COUNTER2
> FROM
> LABOR_TICKET
> WHERE
> ROWNUM<=80) C2;
>
> 80000000 rows created.
>
> Remove a random number of questions from each person - those with
> responses less than 1:
> DELETE FROM
> T1
> WHERE
> ANSWER<=0;
>
> 13592343 rows deleted.
>
> COMMIT;
>
> Set up a large SORT_AREA_SIZE:
> ALTER SYSTEM SET SORT_AREA_SIZE=20971520 DEFERRED;
> ALTER SESSION SET SORT_AREA_SIZE=20971520;
>
> Create the original index:
> CREATE UNIQUE INDEX T1_IDXE ON T1(RESIDENTID,QUESTIONNUMBER,ANSWER);
>
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TABLE_OWNER',TABNAME=>'T1',CASCADE=>TRUE);
>
> ALTER SYSTEM FLUSH SHARED_POOL;
>
> Set up a 10053 trace, Vista seems to have problems returning actuals
> using DBMS_STATS:
> ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST10';
> ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';
>
> select tx.residentid
> from t1 t0,t1 tx
> where t0.residentid!=tx.residentid
> and t0.questionnumber=tx.questionnumber
> and t0.answer=tx.answer
> and t0.residentid=1486674
> group by tx.residentid
> having count(*)>10;
>
> SELECT
> *
> FROM
> TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS'));
>
> ============
> Plan Table
> ============
> ----------------------------------------
> +-----------------------------------+
> | Id | Operation | Name | Rows | Bytes | Cost |
> Time |
> ----------------------------------------
> +-----------------------------------+
> | 0 | SELECT STATEMENT | | | | 217K
> | |
> | 1 | FILTER | | | |
> | |
> | 2 | SORT GROUP BY | | 102K | 2246K | 217K |
> 00:57:46 |
> | 3 | HASH JOIN | | 2041K | 44M | 210K |
> 00:55:43 |
> | 4 | INDEX RANGE SCAN | T1_IDXE | 76 | 836 | 4 |
> 00:00:01 |
> | 5 | TABLE ACCESS FULL | T1 | 216M | 2376M | 208K |
> 00:54:13 |
> ----------------------------------------
> +-----------------------------------+
> Predicate Information:
> ----------------------
> 1 - filter(COUNT(*)>10)
> 3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND
> "T0"."ANSWER"="TX"."ANSWER")
> 3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID")
> 4 - access("T0"."RESIDENTID"=1486674)
> 5 - filter("TX"."RESIDENTID"<>1486674)
>
> Try the other query that was suggested in this group:
> SELECT residentId, count(1)
> FROM t1
> WHERE ( questionNumber, answer ) IN ( SELECT questionNumber, answer
> FROM t1
> WHERE residentId = 1486674 )
> GROUP BY residentId
> HAVING COUNT(1) > 10;
>
> SELECT
> *
> FROM
> TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS'));
>
> ============
> Plan Table
> ============
> ------------------------------------------
> +-----------------------------------+
> | Id | Operation | Name | Rows | Bytes | Cost |
> Time |
> ------------------------------------------
> +-----------------------------------+
> | 0 | SELECT STATEMENT | | | | 208K
> | |
> | 1 | FILTER | | | |
> | |
> | 2 | SORT GROUP BY | | 1384 | 30K | 208K |
> 00:54:11 |
> | 3 | HASH JOIN RIGHT SEMI | | 27K | 595K | 208K |
> 00:54:11 |
> | 4 | INDEX RANGE SCAN | T1_IDXE | 76 | 836 | 4 |
> 00:00:01 |
> | 5 | TABLE ACCESS FULL | T1 | 216M | 2376M | 206K |
> 00:54:40 |
> ------------------------------------------
> +-----------------------------------+
> Predicate Information:
> ----------------------
> 1 - filter(COUNT(*)>10)
> 3 - access("QUESTIONNUMBER"="QUESTIONNUMBER" AND "ANSWER"="ANSWER")
> 4 - access("RESIDENTID"=1486674)
>
> Make some small changes to the table definition:
> ALTER TABLE T1 MODIFY (
> QUESTIONNUMBER NUMBER(12) NOT NULL,
> ANSWER NUMBER(12) NOT NULL);
>
> Create the two indexes that I suggested:
> CREATE INDEX T1_INDEX1 ON T1(RESIDENTID);
> CREATE INDEX T1_INDEX2 ON T1(QUESTIONNUMBER,ANSWER);
>
> EXEC
> DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TABLE_OWNER',TABNAME=>'T1',CASCADE=>TRUE);
>
> ALTER SYSTEM FLUSH SHARED_POOL;
>
> ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST100';
> ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';
>
> select tx.residentid
> from t1 t0,t1 tx
> where t0.residentid!=tx.residentid
> and t0.questionnumber=tx.questionnumber
> and t0.answer=tx.answer
> and t0.residentid=1486674
> group by tx.residentid
> having count(*)>10;
>
> Plan Table
> ============
> ----------------------------------------
> +-----------------------------------+
> | Id | Operation | Name | Rows | Bytes | Cost |
> Time |
> ----------------------------------------
> +-----------------------------------+
> | 0 | SELECT STATEMENT | | | | 217K
> | |
> | 1 | FILTER | | | |
> | |
> | 2 | SORT GROUP BY | | 102K | 2245K | 217K |
> 00:57:46 |
> | 3 | HASH JOIN | | 2041K | 44M | 210K |
> 00:55:43 |
> | 4 | INDEX RANGE SCAN | T1_IDXE | 76 | 836 | 4 |
> 00:00:01 |
> | 5 | TABLE ACCESS FULL | T1 | 216M | 2376M | 208K |
> 00:54:13 |
> ----------------------------------------
> +-----------------------------------+
> Predicate Information:
> ----------------------
> 1 - filter(COUNT(*)>10)
> 3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND
> "T0"."ANSWER"="TX"."ANSWER")
> 3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID")
> 4 - access("T0"."RESIDENTID"=1486674)
> 5 - filter("TX"."RESIDENTID"<>1486674)
>
> Plan stayed the same - clustering factor was lower for the T1_IDEX
> index.
>
> Provide a hint:
> select tx.residentid /*+ INDEX(T0 T1_INDEX1) INDEX(TX T1_INDEX2) */
> from t1 t0,t1 tx
> where t0.residentid!=tx.residentid
> and t0.questionnumber=tx.questionnumber
> and t0.answer=tx.answer
> and t0.residentid=1486674
> group by tx.residentid
> having count(*)>10
>
> Plan Table
> ============
> ----------------------------------------
> +-----------------------------------+
> | Id | Operation | Name | Rows | Bytes | Cost |
> Time |
> ----------------------------------------
> +-----------------------------------+
> | 0 | SELECT STATEMENT | | | | 217K
> | |
> | 1 | FILTER | | | |
> | |
> | 2 | SORT GROUP BY | | 102K | 2245K | 217K |
> 00:57:46 |
> | 3 | HASH JOIN | | 2041K | 44M | 210K |
> 00:55:43 |
> | 4 | INDEX RANGE SCAN | T1_IDXE | 76 | 836 | 4 |
> 00:00:01 |
> | 5 | TABLE ACCESS FULL | T1 | 216M | 2376M | 208K |
> 00:54:13 |
> ----------------------------------------
> +-----------------------------------+
> Predicate Information:
> ----------------------
> 1 - filter(COUNT(*)>10)
> 3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND
> "T0"."ANSWER"="TX"."ANSWER")
> 3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID")
> 4 - access("T0"."RESIDENTID"=1486674)
> 5 - filter("TX"."RESIDENTID"<>1486674)
>
> Plan still using the original index.
>
> DROP INDEX T1_IDXE;
>
> select tx.residentid /*+ INDEX(t0 T1_INDEX1) INDEX(tx T1_INDEX2) */
> from t1 t0,t1 tx
> where t0.residentid!=tx.residentid
> and t0.questionnumber=tx.questionnumber
> and t0.answer=tx.answer
> and t0.residentid=1486674
> group by tx.residentid
> having count(*)>10;
>
> Plan Table
> ============
> ---------------------------------------------------
> +-----------------------------------+
> | Id | Operation | Name | Rows | Bytes |
> Cost | Time |
> ---------------------------------------------------
> +-----------------------------------+
> | 0 | SELECT STATEMENT | | | |
> 218K | |
> | 1 | FILTER | | |
> | | |
> | 2 | SORT GROUP BY | | 102K | 2245K |
> 218K | 00:57:47 |
> | 3 | HASH JOIN | | 2041K | 44M |
> 210K | 00:55:45 |
> | 4 | TABLE ACCESS BY INDEX ROWID | T1 | 76 | 836
> | 80 | 00:00:02 |
> | 5 | INDEX RANGE SCAN | T1_INDEX1| 76 |
> | 4 | 00:00:01 |
> | 6 | TABLE ACCESS FULL | T1 | 216M | 2376M |
> 208K | 00:54:13 |
> ---------------------------------------------------
> +-----------------------------------+
> Predicate Information:
> ----------------------
> 1 - filter(COUNT(*)>10)
> 3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND
> "T0"."ANSWER"="TX"."ANSWER")
> 3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID")
> 5 - access("T0"."RESIDENTID"=1486674)
> 6 - filter("TX"."RESIDENTID"<>1486674)
>
> Try the other query that was suggested:
> ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST101';
> SELECT residentId, count(1)
> FROM t1
> WHERE ( questionNumber, answer ) IN ( SELECT questionNumber, answer
> FROM t1
> WHERE residentId = 1486674 )
> GROUP BY residentId
> HAVING COUNT(1) > 10;
>
> SELECT
> *
> FROM
> TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS'));
>
> Plan Table
> ============
> ---------------------------------------------------
> +-----------------------------------+
> | Id | Operation | Name | Rows | Bytes |
> Cost | Time |
> ---------------------------------------------------
> +-----------------------------------+
> | 0 | SELECT STATEMENT | | | |
> 208K | |
> | 1 | FILTER | | |
> | | |
> | 2 | SORT GROUP BY | | 1384 | 30K |
> 208K | 00:54:12 |
> | 3 | HASH JOIN RIGHT SEMI | | 27K | 595K |
> 208K | 00:54:12 |
> | 4 | TABLE ACCESS BY INDEX ROWID | T1 | 76 | 836
> | 80 | 00:00:02 |
> | 5 | INDEX RANGE SCAN | T1_INDEX1| 76 |
> | 4 | 00:00:01 |
> | 6 | TABLE ACCESS FULL | T1 | 216M | 2376M |
> 206K | 00:54:40 |
> ---------------------------------------------------
> +-----------------------------------+
> Predicate Information:
> ----------------------
> 1 - filter(COUNT(*)>10)
> 3 - access("QUESTIONNUMBER"="QUESTIONNUMBER" AND "ANSWER"="ANSWER")
> 5 - access("RESIDENTID"=1486674)
>
> For some reason, this query returned one row where none were returned
> before.
>
> At this point, I am not sure what the solution may be. Your results
> may vary on other platforms.
>
> Charles Hooper
> PC Support Specialist
> K&M Machine-Fabricating, Inc.

Charles,

I think that "residentId" should be included in the index, as the last column: "questionNumber" + "answer" + "residentId" so that the table read is no longer required. Another index should be created if it's not already in place on "residentId" + "questionNumber" + "answer". Can you try this on your sandbox and let us know what the result is? Received on Mon Mar 19 2007 - 14:48:49 CDT