Path: text.usenetserver.com!out01a.usenetserver.com!news.usenetserver.com!in02.usenetserver.com!news.usenetserver.com!postnews.google.com!b75g2000hsg.googlegroups.com!not-for-mail From: "Charles Hooper" Newsgroups: comp.databases.oracle.server Subject: Re: Help with sql query speed. Explain plan included Date: 19 Mar 2007 11:45:42 -0700 Organization: http://groups.google.com Lines: 450 Message-ID: <1174329942.458784.307930@b75g2000hsg.googlegroups.com> References: <1173964477.511820.149350@n76g2000hsh.googlegroups.com> <1173973783.877883.51310@y66g2000hsf.googlegroups.com> <1174234129.055233.12440@o5g2000hsb.googlegroups.com> <1174234439.202646.138400@n59g2000hsh.googlegroups.com> <1174239550.893300.203650@l75g2000hse.googlegroups.com> <1174240687.027809.235250@o5g2000hsb.googlegroups.com> <1174244188.656948.74240@y66g2000hsf.googlegroups.com> <1174263792.000667.81260@l75g2000hse.googlegroups.com> <1174268213.054837.207190@o5g2000hsb.googlegroups.com> <1174273917.561699.204480@y80g2000hsf.googlegroups.com> NNTP-Posting-Host: 65.118.7.2 Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Trace: posting.google.com 1174329944 17960 127.0.0.1 (19 Mar 2007 18:45:44 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Mon, 19 Mar 2007 18:45:44 +0000 (UTC) In-Reply-To: <1174273917.561699.204480@y80g2000hsf.googlegroups.com> User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506),gzip(gfe),gzip(gfe) Complaints-To: groups-abuse@google.com Injection-Info: b75g2000hsg.googlegroups.com; posting-host=65.118.7.2; posting-account=ytcoAwwAAADhCs0M3G1mFO5tqSfx4ge9 Xref: usenetserver.com comp.databases.oracle.server:424385 X-Received-Date: Mon, 19 Mar 2007 13:45:44 EST (text.usenetserver.com) On Mar 18, 11:11 pm, "fitzjarr...@cox.net" wrote: > On Mar 18, 8:36 pm, "Charles Hooper" wrote: > > The OP is accessing the same table twice in the query with two > > aliases: "test1 t0,test1 tx". The t0 alias is being accessed by the > > RESIDENTID column: "t0.residentid=1486674". To access those rows, and > > index on just the RESIDENTID should be helpful. I think that we both > > agree that such an index will not help this portion of the WHERE > > clause "t1.residentid!=1486674" The one index on the table (IDXE) is > > a composite index covering all three columns. What I was suggesting > > is to test performance with this one three column index split into two > > indexes. The two column composite index will hopefully be able to > > satisfy the "t0.questionnumber=tx.questionnumber and > > t0.answer=tx.answer" portion of the query without requiring a full > > tablescan of table TEST1. > > > Thinking about it a bit more, I think that I understand what you are > > saying - the number of blocks in the two column index may be the same > > or greater than that of the three column table. His query would have > > to check ((3,000,000 - 80) / (3,000,000) * 100)% of the rows in the > > table, and that looks to be quite a bit greater than 15% to 25%. It > > could be interesting to see if Oracle could optimize this > > situation... if each question has an even distribution between the > > 100 possible answers. > > > Charles Hooper > > PC Support Specialist > > K&M Machine-Fabricating, Inc. > The distribution is not even, nor is it guaranteed to be complete > within the range from 1 to 100 for any given question, and this was > stated at the outset by the OP: > > 4. There are 80 questions on each census form. > 5. Each question on the census form has an answer that is between 1 > and 100. > 6. The country residents do not have to answer every question. > 7. If they do not answer a question, a row in the table is not > created. > > I do think the composite index would provide some needed help to this > query, and the index on QUESTIONNUMBER and ANSWER should be > implemented. I also agree that the lone index on this table should be > broken into two separate pieces. My thoughts are that the single- > column index on RESIDENTID, although present, would likely not be used > to eliminate the table scan, however I do believe the two-column index > would reduce the number of returned rows and thus improve the overall > speed of the query. > > David Fitzjarrell> > I just completed some rather extensive testing of this problem on Oracle 10.2.0.3 running on Windows Vista. The results were not quite was I expected. 10.2.0.3, at least on this platform, seems to ignore hints. CREATE TABLE T1 ( RESIDENTID NUMBER(12), QUESTIONNUMBER NUMBER(12), ANSWER NUMBER(12)); Add the first 1,000,000 responses (replace LABOR_TICKET with any table that has at least a million rows): INSERT INTO T1 SELECT C1.COUNTER1, C2.COUNTER2, TRUNC(DBMS_RANDOM.VALUE(-5,101)) FROM (SELECT ROWNUM COUNTER1 FROM LABOR_TICKET WHERE ROWNUM<=1000000) C1, (SELECT ROWNUM COUNTER2 FROM DUAL CONNECT BY LEVEL<=80) C2; 80000000 rows created. Add the second 1,000,000 responses: INSERT INTO T1 SELECT C1.COUNTER1+1000000, C2.COUNTER2, TRUNC(DBMS_RANDOM.VALUE(-5,101)) FROM (SELECT ROWNUM COUNTER1 FROM LABOR_TICKET WHERE ROWNUM<=1000000) C1, (SELECT ROWNUM COUNTER2 FROM LABOR_TICKET WHERE ROWNUM<=80) C2; 80000000 rows created. Add the third 1,000,000 responses: INSERT INTO T1 SELECT C1.COUNTER1+2000000, C2.COUNTER2, TRUNC(DBMS_RANDOM.VALUE(-5,101)) FROM (SELECT ROWNUM COUNTER1 FROM LABOR_TICKET WHERE ROWNUM<=1000000) C1, (SELECT ROWNUM COUNTER2 FROM LABOR_TICKET WHERE ROWNUM<=80) C2; 80000000 rows created. Remove a random number of questions from each person - those with responses less than 1: DELETE FROM T1 WHERE ANSWER<=0; 13592343 rows deleted. COMMIT; Set up a large SORT_AREA_SIZE: ALTER SYSTEM SET SORT_AREA_SIZE=20971520 DEFERRED; ALTER SESSION SET SORT_AREA_SIZE=20971520; Create the original index: CREATE UNIQUE INDEX T1_IDXE ON T1(RESIDENTID,QUESTIONNUMBER,ANSWER); EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TABLE_OWNER',TABNAME=>'T1',CASCADE=>TRUE); ALTER SYSTEM FLUSH SHARED_POOL; Set up a 10053 trace, Vista seems to have problems returning actuals using DBMS_STATS: ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST10'; ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1'; select tx.residentid from t1 t0,t1 tx where t0.residentid!=tx.residentid and t0.questionnumber=tx.questionnumber and t0.answer=tx.answer and t0.residentid=1486674 group by tx.residentid having count(*)>10; SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS')); ============ Plan Table ============ ---------------------------------------- +-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ---------------------------------------- +-----------------------------------+ | 0 | SELECT STATEMENT | | | | 217K | | | 1 | FILTER | | | | | | | 2 | SORT GROUP BY | | 102K | 2246K | 217K | 00:57:46 | | 3 | HASH JOIN | | 2041K | 44M | 210K | 00:55:43 | | 4 | INDEX RANGE SCAN | T1_IDXE | 76 | 836 | 4 | 00:00:01 | | 5 | TABLE ACCESS FULL | T1 | 216M | 2376M | 208K | 00:54:13 | ---------------------------------------- +-----------------------------------+ Predicate Information: ---------------------- 1 - filter(COUNT(*)>10) 3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND "T0"."ANSWER"="TX"."ANSWER") 3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID") 4 - access("T0"."RESIDENTID"=1486674) 5 - filter("TX"."RESIDENTID"<>1486674) Try the other query that was suggested in this group: SELECT residentId, count(1) FROM t1 WHERE ( questionNumber, answer ) IN ( SELECT questionNumber, answer FROM t1 WHERE residentId = 1486674 ) GROUP BY residentId HAVING COUNT(1) > 10; SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS')); ============ Plan Table ============ ------------------------------------------ +-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ------------------------------------------ +-----------------------------------+ | 0 | SELECT STATEMENT | | | | 208K | | | 1 | FILTER | | | | | | | 2 | SORT GROUP BY | | 1384 | 30K | 208K | 00:54:11 | | 3 | HASH JOIN RIGHT SEMI | | 27K | 595K | 208K | 00:54:11 | | 4 | INDEX RANGE SCAN | T1_IDXE | 76 | 836 | 4 | 00:00:01 | | 5 | TABLE ACCESS FULL | T1 | 216M | 2376M | 206K | 00:54:40 | ------------------------------------------ +-----------------------------------+ Predicate Information: ---------------------- 1 - filter(COUNT(*)>10) 3 - access("QUESTIONNUMBER"="QUESTIONNUMBER" AND "ANSWER"="ANSWER") 4 - access("RESIDENTID"=1486674) Make some small changes to the table definition: ALTER TABLE T1 MODIFY ( QUESTIONNUMBER NUMBER(12) NOT NULL, ANSWER NUMBER(12) NOT NULL); Create the two indexes that I suggested: CREATE INDEX T1_INDEX1 ON T1(RESIDENTID); CREATE INDEX T1_INDEX2 ON T1(QUESTIONNUMBER,ANSWER); EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TABLE_OWNER',TABNAME=>'T1',CASCADE=>TRUE); ALTER SYSTEM FLUSH SHARED_POOL; ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST100'; ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1'; select tx.residentid from t1 t0,t1 tx where t0.residentid!=tx.residentid and t0.questionnumber=tx.questionnumber and t0.answer=tx.answer and t0.residentid=1486674 group by tx.residentid having count(*)>10; Plan Table ============ ---------------------------------------- +-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ---------------------------------------- +-----------------------------------+ | 0 | SELECT STATEMENT | | | | 217K | | | 1 | FILTER | | | | | | | 2 | SORT GROUP BY | | 102K | 2245K | 217K | 00:57:46 | | 3 | HASH JOIN | | 2041K | 44M | 210K | 00:55:43 | | 4 | INDEX RANGE SCAN | T1_IDXE | 76 | 836 | 4 | 00:00:01 | | 5 | TABLE ACCESS FULL | T1 | 216M | 2376M | 208K | 00:54:13 | ---------------------------------------- +-----------------------------------+ Predicate Information: ---------------------- 1 - filter(COUNT(*)>10) 3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND "T0"."ANSWER"="TX"."ANSWER") 3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID") 4 - access("T0"."RESIDENTID"=1486674) 5 - filter("TX"."RESIDENTID"<>1486674) Plan stayed the same - clustering factor was lower for the T1_IDEX index. Provide a hint: select tx.residentid /*+ INDEX(T0 T1_INDEX1) INDEX(TX T1_INDEX2) */ from t1 t0,t1 tx where t0.residentid!=tx.residentid and t0.questionnumber=tx.questionnumber and t0.answer=tx.answer and t0.residentid=1486674 group by tx.residentid having count(*)>10 Plan Table ============ ---------------------------------------- +-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ---------------------------------------- +-----------------------------------+ | 0 | SELECT STATEMENT | | | | 217K | | | 1 | FILTER | | | | | | | 2 | SORT GROUP BY | | 102K | 2245K | 217K | 00:57:46 | | 3 | HASH JOIN | | 2041K | 44M | 210K | 00:55:43 | | 4 | INDEX RANGE SCAN | T1_IDXE | 76 | 836 | 4 | 00:00:01 | | 5 | TABLE ACCESS FULL | T1 | 216M | 2376M | 208K | 00:54:13 | ---------------------------------------- +-----------------------------------+ Predicate Information: ---------------------- 1 - filter(COUNT(*)>10) 3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND "T0"."ANSWER"="TX"."ANSWER") 3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID") 4 - access("T0"."RESIDENTID"=1486674) 5 - filter("TX"."RESIDENTID"<>1486674) Plan still using the original index. DROP INDEX T1_IDXE; select tx.residentid /*+ INDEX(t0 T1_INDEX1) INDEX(tx T1_INDEX2) */ from t1 t0,t1 tx where t0.residentid!=tx.residentid and t0.questionnumber=tx.questionnumber and t0.answer=tx.answer and t0.residentid=1486674 group by tx.residentid having count(*)>10; Plan Table ============ --------------------------------------------------- +-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | --------------------------------------------------- +-----------------------------------+ | 0 | SELECT STATEMENT | | | | 218K | | | 1 | FILTER | | | | | | | 2 | SORT GROUP BY | | 102K | 2245K | 218K | 00:57:47 | | 3 | HASH JOIN | | 2041K | 44M | 210K | 00:55:45 | | 4 | TABLE ACCESS BY INDEX ROWID | T1 | 76 | 836 | 80 | 00:00:02 | | 5 | INDEX RANGE SCAN | T1_INDEX1| 76 | | 4 | 00:00:01 | | 6 | TABLE ACCESS FULL | T1 | 216M | 2376M | 208K | 00:54:13 | --------------------------------------------------- +-----------------------------------+ Predicate Information: ---------------------- 1 - filter(COUNT(*)>10) 3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND "T0"."ANSWER"="TX"."ANSWER") 3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID") 5 - access("T0"."RESIDENTID"=1486674) 6 - filter("TX"."RESIDENTID"<>1486674) Try the other query that was suggested: ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST101'; SELECT residentId, count(1) FROM t1 WHERE ( questionNumber, answer ) IN ( SELECT questionNumber, answer FROM t1 WHERE residentId = 1486674 ) GROUP BY residentId HAVING COUNT(1) > 10; SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS')); Plan Table ============ --------------------------------------------------- +-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | --------------------------------------------------- +-----------------------------------+ | 0 | SELECT STATEMENT | | | | 208K | | | 1 | FILTER | | | | | | | 2 | SORT GROUP BY | | 1384 | 30K | 208K | 00:54:12 | | 3 | HASH JOIN RIGHT SEMI | | 27K | 595K | 208K | 00:54:12 | | 4 | TABLE ACCESS BY INDEX ROWID | T1 | 76 | 836 | 80 | 00:00:02 | | 5 | INDEX RANGE SCAN | T1_INDEX1| 76 | | 4 | 00:00:01 | | 6 | TABLE ACCESS FULL | T1 | 216M | 2376M | 206K | 00:54:40 | --------------------------------------------------- +-----------------------------------+ Predicate Information: ---------------------- 1 - filter(COUNT(*)>10) 3 - access("QUESTIONNUMBER"="QUESTIONNUMBER" AND "ANSWER"="ANSWER") 5 - access("RESIDENTID"=1486674) For some reason, this query returned one row where none were returned before. At this point, I am not sure what the solution may be. Your results may vary on other platforms. Charles Hooper PC Support Specialist K&M Machine-Fabricating, Inc.