Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.server -> Re: Help with sql query speed. Explain plan included
On Mar 18, 11:11 pm, "fitzjarr..._at_cox.net" <fitzjarr..._at_cox.net>
wrote:
> On Mar 18, 8:36 pm, "Charles Hooper" <hooperc2..._at_yahoo.com> wrote:
> > The OP is accessing the same table twice in the query with two
> > aliases: "test1 t0,test1 tx". The t0 alias is being accessed by the
> > RESIDENTID column: "t0.residentid=1486674". To access those rows, and
> > index on just the RESIDENTID should be helpful. I think that we both
> > agree that such an index will not help this portion of the WHERE
> > clause "t1.residentid!=1486674" The one index on the table (IDXE) is
> > a composite index covering all three columns. What I was suggesting
> > is to test performance with this one three column index split into two
> > indexes. The two column composite index will hopefully be able to
> > satisfy the "t0.questionnumber=tx.questionnumber and
> > t0.answer=tx.answer" portion of the query without requiring a full
> > tablescan of table TEST1.
>
> > Thinking about it a bit more, I think that I understand what you are
> > saying - the number of blocks in the two column index may be the same
> > or greater than that of the three column table. His query would have
> > to check ((3,000,000 - 80) / (3,000,000) * 100)% of the rows in the
> > table, and that looks to be quite a bit greater than 15% to 25%. It
> > could be interesting to see if Oracle could optimize this
> > situation... if each question has an even distribution between the
> > 100 possible answers.
>
> > Charles Hooper
> > PC Support Specialist
> > K&M Machine-Fabricating, Inc.
> The distribution is not even, nor is it guaranteed to be complete
> within the range from 1 to 100 for any given question, and this was
> stated at the outset by the OP:
>
> 4. There are 80 questions on each census form.
> 5. Each question on the census form has an answer that is between 1
> and 100.
> 6. The country residents do not have to answer every question.
> 7. If they do not answer a question, a row in the table is not
> created.
>
> I do think the composite index would provide some needed help to this
> query, and the index on QUESTIONNUMBER and ANSWER should be
> implemented. I also agree that the lone index on this table should be
> broken into two separate pieces. My thoughts are that the single-
> column index on RESIDENTID, although present, would likely not be used
> to eliminate the table scan, however I do believe the two-column index
> would reduce the number of returned rows and thus improve the overall
> speed of the query.
>
> David Fitzjarrell>
>
I just completed some rather extensive testing of this problem on Oracle 10.2.0.3 running on Windows Vista. The results were not quite was I expected. 10.2.0.3, at least on this platform, seems to ignore hints.
CREATE TABLE T1 (
RESIDENTID NUMBER(12),
QUESTIONNUMBER NUMBER(12),
ANSWER NUMBER(12));
Add the first 1,000,000 responses (replace LABOR_TICKET with any table
that has at least a million rows):
INSERT INTO
T1
SELECT
C1.COUNTER1,
C2.COUNTER2,
TRUNC(DBMS_RANDOM.VALUE(-5,101))
FROM
(SELECT
ROWNUM COUNTER1
FROM
LABOR_TICKET
WHERE
ROWNUM<=1000000) C1,
(SELECT
ROWNUM COUNTER2
FROM
DUAL
CONNECT BY
LEVEL<=80) C2;
80000000 rows created.
Add the second 1,000,000 responses:
INSERT INTO
T1
SELECT
C1.COUNTER1+1000000,
C2.COUNTER2,
TRUNC(DBMS_RANDOM.VALUE(-5,101))
FROM
(SELECT
ROWNUM COUNTER1
FROM
LABOR_TICKET
WHERE
ROWNUM<=1000000) C1,
(SELECT
ROWNUM COUNTER2
FROM
LABOR_TICKET
WHERE
ROWNUM<=80) C2;
80000000 rows created.
Add the third 1,000,000 responses:
INSERT INTO
T1
SELECT
C1.COUNTER1+2000000,
C2.COUNTER2,
TRUNC(DBMS_RANDOM.VALUE(-5,101))
FROM
(SELECT
ROWNUM COUNTER1
FROM
LABOR_TICKET
WHERE
ROWNUM<=1000000) C1,
(SELECT
ROWNUM COUNTER2
FROM
LABOR_TICKET
WHERE
ROWNUM<=80) C2;
80000000 rows created.
Remove a random number of questions from each person - those with
responses less than 1:
DELETE FROM
T1
WHERE
ANSWER<=0;
13592343 rows deleted.
COMMIT;
Set up a large SORT_AREA_SIZE:
ALTER SYSTEM SET SORT_AREA_SIZE=20971520 DEFERRED;
ALTER SESSION SET SORT_AREA_SIZE=20971520;
Create the original index:
CREATE UNIQUE INDEX T1_IDXE ON T1(RESIDENTID,QUESTIONNUMBER,ANSWER);
EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TABLE_OWNER',TABNAME=>'T1',CASCADE=>TRUE);
ALTER SYSTEM FLUSH SHARED_POOL;
Set up a 10053 trace, Vista seems to have problems returning actuals
using DBMS_STATS:
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST10';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';
select tx.residentid
from t1 t0,t1 tx
where t0.residentid!=tx.residentid
and t0.questionnumber=tx.questionnumber
and t0.answer=tx.answer
and t0.residentid=1486674
group by tx.residentid
having count(*)>10;
SELECT
*
FROM
TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS'));
+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ---------------------------------------- +-----------------------------------+ | 0 | SELECT STATEMENT | | | | 217K | | | 1 | FILTER | | | | | | | 2 | SORT GROUP BY | | 102K | 2246K | 217K | 00:57:46 | | 3 | HASH JOIN | | 2041K | 44M | 210K | 00:55:43 | | 4 | INDEX RANGE SCAN | T1_IDXE | 76 | 836 | 4 | 00:00:01 | | 5 | TABLE ACCESS FULL | T1 | 216M | 2376M | 208K |00:54:13 |
+-----------------------------------+
3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND "T0"."ANSWER"="TX"."ANSWER") 3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID") 4 - access("T0"."RESIDENTID"=1486674) 5 - filter("TX"."RESIDENTID"<>1486674)
Try the other query that was suggested in this group:
SELECT residentId, count(1)
FROM t1
WHERE ( questionNumber, answer ) IN ( SELECT questionNumber, answer
FROM t1 WHERE residentId = 1486674 )GROUP BY residentId
+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ------------------------------------------ +-----------------------------------+ | 0 | SELECT STATEMENT | | | | 208K | | | 1 | FILTER | | | | | | | 2 | SORT GROUP BY | | 1384 | 30K | 208K | 00:54:11 | | 3 | HASH JOIN RIGHT SEMI | | 27K | 595K | 208K | 00:54:11 | | 4 | INDEX RANGE SCAN | T1_IDXE | 76 | 836 | 4 | 00:00:01 | | 5 | TABLE ACCESS FULL | T1 | 216M | 2376M | 206K |00:54:40 |
+-----------------------------------+
1 - filter(COUNT(*)>10) 3 - access("QUESTIONNUMBER"="QUESTIONNUMBER" AND "ANSWER"="ANSWER") 4 - access("RESIDENTID"=1486674)
Make some small changes to the table definition:
ALTER TABLE T1 MODIFY (
QUESTIONNUMBER NUMBER(12) NOT NULL,
ANSWER NUMBER(12) NOT NULL);
Create the two indexes that I suggested:
CREATE INDEX T1_INDEX1 ON T1(RESIDENTID);
CREATE INDEX T1_INDEX2 ON T1(QUESTIONNUMBER,ANSWER);
EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TABLE_OWNER',TABNAME=>'T1',CASCADE=>TRUE);
ALTER SYSTEM FLUSH SHARED_POOL;
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST100';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';
select tx.residentid
from t1 t0,t1 tx
where t0.residentid!=tx.residentid
and t0.questionnumber=tx.questionnumber
and t0.answer=tx.answer
and t0.residentid=1486674
group by tx.residentid
having count(*)>10;
Plan Table
+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ---------------------------------------- +-----------------------------------+ | 0 | SELECT STATEMENT | | | | 217K | | | 1 | FILTER | | | | | | | 2 | SORT GROUP BY | | 102K | 2245K | 217K | 00:57:46 | | 3 | HASH JOIN | | 2041K | 44M | 210K | 00:55:43 | | 4 | INDEX RANGE SCAN | T1_IDXE | 76 | 836 | 4 | 00:00:01 | | 5 | TABLE ACCESS FULL | T1 | 216M | 2376M | 208K |00:54:13 |
+-----------------------------------+
3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND "T0"."ANSWER"="TX"."ANSWER") 3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID") 4 - access("T0"."RESIDENTID"=1486674) 5 - filter("TX"."RESIDENTID"<>1486674)
Plan stayed the same - clustering factor was lower for the T1_IDEX index.
Provide a hint:
select tx.residentid /*+ INDEX(T0 T1_INDEX1) INDEX(TX T1_INDEX2) */
from t1 t0,t1 tx
where t0.residentid!=tx.residentid
and t0.questionnumber=tx.questionnumber
and t0.answer=tx.answer
and t0.residentid=1486674
group by tx.residentid
having count(*)>10
Plan Table
+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ---------------------------------------- +-----------------------------------+ | 0 | SELECT STATEMENT | | | | 217K | | | 1 | FILTER | | | | | | | 2 | SORT GROUP BY | | 102K | 2245K | 217K | 00:57:46 | | 3 | HASH JOIN | | 2041K | 44M | 210K | 00:55:43 | | 4 | INDEX RANGE SCAN | T1_IDXE | 76 | 836 | 4 | 00:00:01 | | 5 | TABLE ACCESS FULL | T1 | 216M | 2376M | 208K |00:54:13 |
+-----------------------------------+
3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND "T0"."ANSWER"="TX"."ANSWER") 3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID") 4 - access("T0"."RESIDENTID"=1486674) 5 - filter("TX"."RESIDENTID"<>1486674)
Plan still using the original index.
DROP INDEX T1_IDXE;
select tx.residentid /*+ INDEX(t0 T1_INDEX1) INDEX(tx T1_INDEX2) */
from t1 t0,t1 tx
where t0.residentid!=tx.residentid
and t0.questionnumber=tx.questionnumber
and t0.answer=tx.answer
and t0.residentid=1486674
group by tx.residentid
having count(*)>10;
Plan Table
+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | --------------------------------------------------- +-----------------------------------+ | 0 | SELECT STATEMENT | | | | 218K | | | 1 | FILTER | | | | | | | 2 | SORT GROUP BY | | 102K | 2245K | 218K | 00:57:47 | | 3 | HASH JOIN | | 2041K | 44M | 210K | 00:55:45 | | 4 | TABLE ACCESS BY INDEX ROWID | T1 | 76 | 836 | 80 | 00:00:02 | | 5 | INDEX RANGE SCAN | T1_INDEX1| 76 | | 4 | 00:00:01 | | 6 | TABLE ACCESS FULL | T1 | 216M | 2376M |208K | 00:54:13 |
+-----------------------------------+
3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND "T0"."ANSWER"="TX"."ANSWER") 3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID") 5 - access("T0"."RESIDENTID"=1486674) 6 - filter("TX"."RESIDENTID"<>1486674)
Try the other query that was suggested:
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST101';
SELECT residentId, count(1)
FROM t1
WHERE ( questionNumber, answer ) IN ( SELECT questionNumber, answer
FROM t1 WHERE residentId = 1486674 )GROUP BY residentId
+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | --------------------------------------------------- +-----------------------------------+ | 0 | SELECT STATEMENT | | | | 208K | | | 1 | FILTER | | | | | | | 2 | SORT GROUP BY | | 1384 | 30K | 208K | 00:54:12 | | 3 | HASH JOIN RIGHT SEMI | | 27K | 595K | 208K | 00:54:12 | | 4 | TABLE ACCESS BY INDEX ROWID | T1 | 76 | 836 | 80 | 00:00:02 | | 5 | INDEX RANGE SCAN | T1_INDEX1| 76 | | 4 | 00:00:01 | | 6 | TABLE ACCESS FULL | T1 | 216M | 2376M |206K | 00:54:40 |
+-----------------------------------+
1 - filter(COUNT(*)>10) 3 - access("QUESTIONNUMBER"="QUESTIONNUMBER" AND "ANSWER"="ANSWER") 5 - access("RESIDENTID"=1486674)
For some reason, this query returned one row where none were returned before.
At this point, I am not sure what the solution may be. Your results may vary on other platforms.
Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc.
Received on Mon Mar 19 2007 - 13:45:42 CDT
![]() |
![]() |