Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Help with sql query speed. Explain plan included

Re: Help with sql query speed. Explain plan included

From: <fitzjarrell_at_cox.net>
Date: 15 Mar 2007 08:49:43 -0700
Message-ID: <1173973783.877883.51310@y66g2000hsf.googlegroups.com>


On Mar 15, 8:14 am, JL19..._at_HOTMAIL.CO.UK wrote:
> Hello,
>
> I have a query about how to improve the speed of our sql and am asking
> for advice.
>
> My real system is complicated and relates to a call center, automated
> options chosen, and time on line using an automated telephone system.
> Rather than having to describe all the complicated bits, I have made
> up a situation using census data instead because we can all understand
> it.
>
> I have itemized the points of the made up situation. Here they are.
>
> 1. The data is census information.
> 2. There are 3 million people in the country.
> 3. Each person has to fill out a census form.
> 4. There are 80 questions on each census form.
> 5. Each question on the census form has an answer that is between 1
> and 100.
> 6. The country residents do not have to answer every question.
> 7. If they do not answer a question, a row in the table is not
> created.
>
> The information is stored in an Oracle database with the following
> structure.
>
> residentID questionNumber answer
> 1 1 10
> 1 2 45
> 1 3 67
> 1 4 55
> 2 1 15
> 2 2 56
> 2 3 12
>
> etc
>
> We need a query that that reads "give us the resident ID of any person
> that has answered more than 10 questions the same as the resident with
> ID number 1486674"
>
> Here is our sql.
>
> select tx.residentid
> from test1 t0,test1 tx
> where t0.residentid!=tx.residentid
> and t0.questionnumber=tx.questionnumber
> and t0.answer=tx.answer
> and t0.residentid=1486674
> group by tx.residentid
> having count(*)>10;
>
> The problem with this query is that it only returns 177 rows yet takes
> "Elapsed: 00:01:10.75" which is about 70 seconds.
>
> The only thing that is important to us is query speed. If the data
> needs restructuring to get speed then this is OK. But firstly is there
> a better SQL to this without restructuring of data.
>
> The plan statistics are calculated on 100% of rows in all tables with
> cascade at 2am every morning. They are up to date.
>
> Here is my plan. Index idxe has been created with "create unique index
> idxe on test1(residentid,questionnumber,answer)". Just in case it
> matters, there are no other table indexes but can easily add them if
> it improves performance.
>
> We use 10.2 on a Solaris V245.
>
> Thank you for helping in advance.
> John
>
> --------------------------------------------------------------------------------------
> | Id | Operation | Name | Rows | Bytes |TempSpc| Cost
> (%CPU)| Time |
> --------------------------------------------------------------------------------------
> | 0 | SELECT STATEMENT | | 33322 | 715K| | 26691
> (7)| 00:05:21 |
> |* 1 | FILTER | | | |
> | | |
> | 2 | HASH GROUP BY | | 33322 | 715K| 21M| 26691
> (7)| 00:05:21 |
> |* 3 | HASH JOIN | | 666K| 13M| | 24919
> (7)| 00:05:00 |
> |* 4 | INDEX RANGE SCAN | IDXE | 22 | 242 | | 3
> (0)| 00:00:01 |
> |* 5 | TABLE ACCESS FULL| TEST1 | 37M| 398M| | 24428
> (6)| 00:04:54 |
>
> PLAN_TABLE_OUTPUT
> ---------------------------------------------------
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 1 - filter(COUNT(*)>10)
> 3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND
> "T0"."ANSWER"="TX"."ANSWER")
> filter("T0"."RESIDENTID"<>"TX"."RESIDENTID")
> 4 - access("T0"."RESIDENTID"=1486674)
> 5 - filter("TX"."RESIDENT"<>1486674)
>
> 22 rows selected.

What is causing you 'grief' is this:

 where t0.residentid!=tx.residentid

An index on residentid won't help here as you're asking for every record EXCEPT one; even with an index present given that condition you really can't avoid a full table scan and you don't want to. But I think that's not really the issue here, I think this might be a situation where your sort area may be much too small for such a task and you're swapping out to disk instead of doing such sorts in memory (group by queries tend to order the data by the grouped columns, though I wouldn't recommend using this behaviour to ensure ordered output). I would have your DBA check if you're sorting to disk; if so (and I believe you are) the sort_area_size should be increased. Once that's done try your query again; you may find it runs in less time.

David Fitzjarrell Received on Thu Mar 15 2007 - 10:49:43 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US