Path: text.usenetserver.com!out01a.usenetserver.com!news.usenetserver.com!in02.usenetserver.com!news.usenetserver.com!postnews.google.com!b75g2000hsg.googlegroups.com!not-for-mail
From: "Charles Hooper" <hooperc2000@yahoo.com>
Newsgroups: comp.databases.oracle.server
Subject: Re: Help with sql query speed. Explain plan included
Date: 19 Mar 2007 11:45:42 -0700
Organization: http://groups.google.com
Lines: 450
Message-ID: <1174329942.458784.307930@b75g2000hsg.googlegroups.com>
References: <1173964477.511820.149350@n76g2000hsh.googlegroups.com>
   <1173973783.877883.51310@y66g2000hsf.googlegroups.com>
   <1174234129.055233.12440@o5g2000hsb.googlegroups.com>
   <1174234439.202646.138400@n59g2000hsh.googlegroups.com>
   <1174239550.893300.203650@l75g2000hse.googlegroups.com>
   <1174240687.027809.235250@o5g2000hsb.googlegroups.com>
   <1174244188.656948.74240@y66g2000hsf.googlegroups.com>
   <1174263792.000667.81260@l75g2000hse.googlegroups.com>
   <1174268213.054837.207190@o5g2000hsb.googlegroups.com>
   <1174273917.561699.204480@y80g2000hsf.googlegroups.com>
NNTP-Posting-Host: 65.118.7.2
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
X-Trace: posting.google.com 1174329944 17960 127.0.0.1 (19 Mar 2007 18:45:44 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Mon, 19 Mar 2007 18:45:44 +0000 (UTC)
In-Reply-To: <1174273917.561699.204480@y80g2000hsf.googlegroups.com>
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506),gzip(gfe),gzip(gfe)
Complaints-To: groups-abuse@google.com
Injection-Info: b75g2000hsg.googlegroups.com; posting-host=65.118.7.2;
   posting-account=ytcoAwwAAADhCs0M3G1mFO5tqSfx4ge9
Xref: usenetserver.com comp.databases.oracle.server:424385
X-Received-Date: Mon, 19 Mar 2007 13:45:44 EST (text.usenetserver.com)

On Mar 18, 11:11 pm, "fitzjarr...@cox.net" <fitzjarr...@cox.net>
wrote:
> On Mar 18, 8:36 pm, "Charles Hooper" <hooperc2...@yahoo.com> wrote:
> > The OP is accessing the same table twice in the query with two
> > aliases: "test1 t0,test1 tx".  The t0 alias is being accessed by the
> > RESIDENTID column: "t0.residentid=1486674".  To access those rows, and
> > index on just the RESIDENTID should be helpful.  I think that we both
> > agree that such an index will not help this portion of the WHERE
> > clause "t1.residentid!=1486674"  The one index on the table (IDXE) is
> > a composite index covering all three columns.  What I was suggesting
> > is to test performance with this one three column index split into two
> > indexes.  The two column composite index will hopefully be able to
> > satisfy the "t0.questionnumber=tx.questionnumber and
> > t0.answer=tx.answer" portion of the query without requiring a full
> > tablescan of table TEST1.
>
> > Thinking about it a bit more, I think that I understand what you are
> > saying - the number of blocks in the two column index may be the same
> > or greater than that of the three column table.  His query would have
> > to check ((3,000,000 - 80) / (3,000,000) * 100)% of the rows in the
> > table, and that looks to be quite a bit greater than 15% to 25%.  It
> > could be interesting to see if Oracle could optimize this
> > situation...  if each question has an even distribution between the
> > 100 possible answers.
>
> > Charles Hooper
> > PC Support Specialist
> > K&M Machine-Fabricating, Inc.
> The distribution is not even, nor is it guaranteed to be complete
> within the range from 1 to 100 for any given question, and this was
> stated at the outset by the OP:
>
> 4. There are 80 questions on each census form.
> 5. Each question on the census form has an answer that is between 1
> and 100.
> 6. The country residents do not have to answer every question.
> 7. If they do not answer a question, a row in the table is not
> created.
>
> I do think the composite index would provide some needed help to this
> query, and the index on QUESTIONNUMBER and ANSWER should be
> implemented.  I also agree that the lone index on this table should be
> broken into two separate pieces.  My thoughts are that the single-
> column index on RESIDENTID, although present, would likely not be used
> to eliminate the table scan, however I do believe the two-column index
> would reduce the number of returned rows and thus improve the overall
> speed of the query.
>
> David Fitzjarrell>
>

I just completed some rather extensive testing of this problem on
Oracle 10.2.0.3 running on Windows Vista.  The results were not quite
was I expected.   10.2.0.3, at least on this platform, seems to ignore
hints.

CREATE TABLE T1 (
  RESIDENTID NUMBER(12),
  QUESTIONNUMBER NUMBER(12),
  ANSWER NUMBER(12));

Add the first 1,000,000 responses (replace LABOR_TICKET with any table
that has at least a million rows):
INSERT INTO
  T1
SELECT
  C1.COUNTER1,
  C2.COUNTER2,
  TRUNC(DBMS_RANDOM.VALUE(-5,101))
FROM
  (SELECT
    ROWNUM COUNTER1
  FROM
    LABOR_TICKET
  WHERE
    ROWNUM<=1000000) C1,
  (SELECT
    ROWNUM COUNTER2
  FROM
    DUAL
  CONNECT BY
    LEVEL<=80) C2;

80000000 rows created.

Add the second 1,000,000 responses:
INSERT INTO
  T1
SELECT
  C1.COUNTER1+1000000,
  C2.COUNTER2,
  TRUNC(DBMS_RANDOM.VALUE(-5,101))
FROM
  (SELECT
    ROWNUM COUNTER1
  FROM
    LABOR_TICKET
  WHERE
    ROWNUM<=1000000) C1,
  (SELECT
    ROWNUM COUNTER2
  FROM
    LABOR_TICKET
  WHERE
    ROWNUM<=80) C2;

80000000 rows created.

Add the third 1,000,000 responses:
INSERT INTO
  T1
SELECT
  C1.COUNTER1+2000000,
  C2.COUNTER2,
  TRUNC(DBMS_RANDOM.VALUE(-5,101))
FROM
  (SELECT
    ROWNUM COUNTER1
  FROM
    LABOR_TICKET
  WHERE
    ROWNUM<=1000000) C1,
  (SELECT
    ROWNUM COUNTER2
  FROM
    LABOR_TICKET
  WHERE
    ROWNUM<=80) C2;

80000000 rows created.

Remove a random number of questions from each person - those with
responses less than 1:
DELETE FROM
  T1
WHERE
  ANSWER<=0;

13592343 rows deleted.

COMMIT;

Set up a large SORT_AREA_SIZE:
ALTER SYSTEM SET SORT_AREA_SIZE=20971520 DEFERRED;
ALTER SESSION SET SORT_AREA_SIZE=20971520;

Create the original index:
CREATE UNIQUE INDEX T1_IDXE ON T1(RESIDENTID,QUESTIONNUMBER,ANSWER);

EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TABLE_OWNER',TABNAME=>'T1',CASCADE=>TRUE);

ALTER SYSTEM FLUSH SHARED_POOL;

Set up a 10053 trace, Vista seems to have problems returning actuals
using DBMS_STATS:
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST10';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

select tx.residentid
 from t1 t0,t1 tx
  where  t0.residentid!=tx.residentid
   and  t0.questionnumber=tx.questionnumber
   and t0.answer=tx.answer
   and t0.residentid=1486674
    group by tx.residentid
     having count(*)>10;

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS'));

============
Plan Table
============
----------------------------------------
+-----------------------------------+
| Id  | Operation             | Name    | Rows  | Bytes | Cost  |
Time      |
----------------------------------------
+-----------------------------------+
| 0   | SELECT STATEMENT      |         |       |       |  217K
|           |
| 1   |  FILTER               |         |       |       |
|           |
| 2   |   SORT GROUP BY       |         |  102K | 2246K |  217K |
00:57:46 |
| 3   |    HASH JOIN          |         | 2041K |   44M |  210K |
00:55:43 |
| 4   |     INDEX RANGE SCAN  | T1_IDXE |    76 |   836 |     4 |
00:00:01 |
| 5   |     TABLE ACCESS FULL | T1      |  216M | 2376M |  208K |
00:54:13 |
----------------------------------------
+-----------------------------------+
Predicate Information:
----------------------
1 - filter(COUNT(*)>10)
3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND
"T0"."ANSWER"="TX"."ANSWER")
3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID")
4 - access("T0"."RESIDENTID"=1486674)
5 - filter("TX"."RESIDENTID"<>1486674)

Try the other query that was suggested in this group:
SELECT residentId, count(1)
  FROM t1
 WHERE ( questionNumber, answer ) IN ( SELECT questionNumber, answer
                                         FROM t1
                                        WHERE residentId = 1486674 )
 GROUP BY residentId
 HAVING COUNT(1) > 10;

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS'));

============
Plan Table
============
------------------------------------------
+-----------------------------------+
| Id  | Operation               | Name    | Rows  | Bytes | Cost  |
Time      |
------------------------------------------
+-----------------------------------+
| 0   | SELECT STATEMENT        |         |       |       |  208K
|           |
| 1   |  FILTER                 |         |       |       |
|           |
| 2   |   SORT GROUP BY         |         |  1384 |   30K |  208K |
00:54:11 |
| 3   |    HASH JOIN RIGHT SEMI |         |   27K |  595K |  208K |
00:54:11 |
| 4   |     INDEX RANGE SCAN    | T1_IDXE |    76 |   836 |     4 |
00:00:01 |
| 5   |     TABLE ACCESS FULL   | T1      |  216M | 2376M |  206K |
00:54:40 |
------------------------------------------
+-----------------------------------+
Predicate Information:
----------------------
1 - filter(COUNT(*)>10)
3 - access("QUESTIONNUMBER"="QUESTIONNUMBER" AND "ANSWER"="ANSWER")
4 - access("RESIDENTID"=1486674)

Make some small changes to the table definition:
ALTER TABLE T1 MODIFY (
  QUESTIONNUMBER NUMBER(12) NOT NULL,
  ANSWER NUMBER(12) NOT NULL);

Create the two indexes that I suggested:
CREATE INDEX T1_INDEX1 ON T1(RESIDENTID);
CREATE INDEX T1_INDEX2 ON T1(QUESTIONNUMBER,ANSWER);

EXEC
DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'TABLE_OWNER',TABNAME=>'T1',CASCADE=>TRUE);

ALTER SYSTEM FLUSH SHARED_POOL;

ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST100';
ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';

select tx.residentid
 from t1 t0,t1 tx
  where  t0.residentid!=tx.residentid
   and  t0.questionnumber=tx.questionnumber
   and t0.answer=tx.answer
   and t0.residentid=1486674
    group by tx.residentid
     having count(*)>10;

Plan Table
============
----------------------------------------
+-----------------------------------+
| Id  | Operation             | Name    | Rows  | Bytes | Cost  |
Time      |
----------------------------------------
+-----------------------------------+
| 0   | SELECT STATEMENT      |         |       |       |  217K
|           |
| 1   |  FILTER               |         |       |       |
|           |
| 2   |   SORT GROUP BY       |         |  102K | 2245K |  217K |
00:57:46 |
| 3   |    HASH JOIN          |         | 2041K |   44M |  210K |
00:55:43 |
| 4   |     INDEX RANGE SCAN  | T1_IDXE |    76 |   836 |     4 |
00:00:01 |
| 5   |     TABLE ACCESS FULL | T1      |  216M | 2376M |  208K |
00:54:13 |
----------------------------------------
+-----------------------------------+
Predicate Information:
----------------------
1 - filter(COUNT(*)>10)
3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND
"T0"."ANSWER"="TX"."ANSWER")
3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID")
4 - access("T0"."RESIDENTID"=1486674)
5 - filter("TX"."RESIDENTID"<>1486674)

Plan stayed the same - clustering factor was lower for the T1_IDEX
index.

Provide a hint:
select tx.residentid /*+ INDEX(T0 T1_INDEX1) INDEX(TX T1_INDEX2) */
 from t1 t0,t1 tx
  where  t0.residentid!=tx.residentid
   and  t0.questionnumber=tx.questionnumber
   and t0.answer=tx.answer
   and t0.residentid=1486674
    group by tx.residentid
     having count(*)>10

Plan Table
============
----------------------------------------
+-----------------------------------+
| Id  | Operation             | Name    | Rows  | Bytes | Cost  |
Time      |
----------------------------------------
+-----------------------------------+
| 0   | SELECT STATEMENT      |         |       |       |  217K
|           |
| 1   |  FILTER               |         |       |       |
|           |
| 2   |   SORT GROUP BY       |         |  102K | 2245K |  217K |
00:57:46 |
| 3   |    HASH JOIN          |         | 2041K |   44M |  210K |
00:55:43 |
| 4   |     INDEX RANGE SCAN  | T1_IDXE |    76 |   836 |     4 |
00:00:01 |
| 5   |     TABLE ACCESS FULL | T1      |  216M | 2376M |  208K |
00:54:13 |
----------------------------------------
+-----------------------------------+
Predicate Information:
----------------------
1 - filter(COUNT(*)>10)
3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND
"T0"."ANSWER"="TX"."ANSWER")
3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID")
4 - access("T0"."RESIDENTID"=1486674)
5 - filter("TX"."RESIDENTID"<>1486674)

Plan still using the original index.

DROP INDEX T1_IDXE;

select tx.residentid /*+ INDEX(t0 T1_INDEX1) INDEX(tx T1_INDEX2) */
 from t1 t0,t1 tx
  where  t0.residentid!=tx.residentid
   and  t0.questionnumber=tx.questionnumber
   and t0.answer=tx.answer
   and t0.residentid=1486674
    group by tx.residentid
     having count(*)>10;

Plan Table
============
---------------------------------------------------
+-----------------------------------+
| Id  | Operation                       | Name     | Rows  | Bytes |
Cost  | Time      |
---------------------------------------------------
+-----------------------------------+
| 0   | SELECT STATEMENT                |          |       |       |
218K |           |
| 1   |  FILTER                         |          |       |
|       |           |
| 2   |   SORT GROUP BY                 |          |  102K | 2245K |
218K |  00:57:47 |
| 3   |    HASH JOIN                    |          | 2041K |   44M |
210K |  00:55:45 |
| 4   |     TABLE ACCESS BY INDEX ROWID | T1       |    76 |   836
|    80 |  00:00:02 |
| 5   |      INDEX RANGE SCAN           | T1_INDEX1|    76 |
|     4 |  00:00:01 |
| 6   |     TABLE ACCESS FULL           | T1       |  216M | 2376M |
208K |  00:54:13 |
---------------------------------------------------
+-----------------------------------+
Predicate Information:
----------------------
1 - filter(COUNT(*)>10)
3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND
"T0"."ANSWER"="TX"."ANSWER")
3 - filter("T0"."RESIDENTID"<>"TX"."RESIDENTID")
5 - access("T0"."RESIDENTID"=1486674)
6 - filter("TX"."RESIDENTID"<>1486674)

Try the other query that was suggested:
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'TEST101';
SELECT residentId, count(1)
  FROM t1
 WHERE ( questionNumber, answer ) IN ( SELECT questionNumber, answer
                                         FROM t1
                                        WHERE residentId = 1486674 )
 GROUP BY residentId
 HAVING COUNT(1) > 10;

SELECT
  *
FROM
  TABLE(DBMS_XPLAN.DISPLAY_CURSOR(NULL,NULL,'ALLSTATS'));

Plan Table
============
---------------------------------------------------
+-----------------------------------+
| Id  | Operation                       | Name     | Rows  | Bytes |
Cost  | Time      |
---------------------------------------------------
+-----------------------------------+
| 0   | SELECT STATEMENT                |          |       |       |
208K |           |
| 1   |  FILTER                         |          |       |
|       |           |
| 2   |   SORT GROUP BY                 |          |  1384 |   30K |
208K |  00:54:12 |
| 3   |    HASH JOIN RIGHT SEMI         |          |   27K |  595K |
208K |  00:54:12 |
| 4   |     TABLE ACCESS BY INDEX ROWID | T1       |    76 |   836
|    80 |  00:00:02 |
| 5   |      INDEX RANGE SCAN           | T1_INDEX1|    76 |
|     4 |  00:00:01 |
| 6   |     TABLE ACCESS FULL           | T1       |  216M | 2376M |
206K |  00:54:40 |
---------------------------------------------------
+-----------------------------------+
Predicate Information:
----------------------
1 - filter(COUNT(*)>10)
3 - access("QUESTIONNUMBER"="QUESTIONNUMBER" AND "ANSWER"="ANSWER")
5 - access("RESIDENTID"=1486674)

For some reason, this query returned one row where none were returned
before.

At this point, I am not sure what the solution may be.  Your results
may vary on other platforms.

Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc.