Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: Help with sql query speed. Explain plan included
On Mar 16, 7:25 am, "Charles Hooper" <hooperc2..._at_yahoo.com> wrote:
> On Mar 15, 5:10 pm, "Valentin Minzatu" <valentinminz..._at_yahoo.com>
> wrote:
>
>
>
>
>
> > On Mar 15, 1:53 pm, "Charles Hooper" <hooperc2..._at_yahoo.com> wrote:
> > > On Mar 15, 9:14 am, JL19..._at_HOTMAIL.CO.UK wrote:
>
> > > > Hello,
>
> > > > I have a query about how to improve the speed of our sql and am asking
> > > > for advice.
>
> > > > My real system is complicated and relates to a call center, automated
> > > > options chosen, and time on line using an automated telephone system.
> > > > Rather than having to describe all the complicated bits, I have made
> > > > up a situation using census data instead because we can all understand
> > > > it.
>
> > > > I have itemized the points of the made up situation. Here they are.
>
> > > > 1. The data is census information.
> > > > 2. There are 3 million people in the country.
> > > > 3. Each person has to fill out a census form.
> > > > 4. There are 80 questions on each census form.
> > > > 5. Each question on the census form has an answer that is between 1
> > > > and 100.
> > > > 6. The country residents do not have to answer every question.
> > > > 7. If they do not answer a question, a row in the table is not
> > > > created.
>
> > > > The information is stored in an Oracle database with the following
> > > > structure.
>
> > > > residentID questionNumber answer
> > > > 1 1 10
> > > > 1 2 45
> > > > 1 3 67
> > > > 1 4 55
> > > > 2 1 15
> > > > 2 2 56
> > > > 2 3 12
>
> > > > etc
>
> > > > We need a query that that reads "give us the resident ID of any person
> > > > that has answered more than 10 questions the same as the resident with
> > > > ID number 1486674"
>
> > > > Here is our sql.
>
> > > > select tx.residentid
> > > > from test1 t0,test1 tx
> > > > where t0.residentid!=tx.residentid
> > > > and t0.questionnumber=tx.questionnumber
> > > > and t0.answer=tx.answer
> > > > and t0.residentid=1486674
> > > > group by tx.residentid
> > > > having count(*)>10;
>
> > > > The problem with this query is that it only returns 177 rows yet takes
> > > > "Elapsed: 00:01:10.75" which is about 70 seconds.
>
> > > > The only thing that is important to us is query speed. If the data
> > > > needs restructuring to get speed then this is OK. But firstly is there
> > > > a better SQL to this without restructuring of data.
>
> > > > The plan statistics are calculated on 100% of rows in all tables with
> > > > cascade at 2am every morning. They are up to date.
>
> > > > Here is my plan. Index idxe has been created with "create unique index
> > > > idxe on test1(residentid,questionnumber,answer)". Just in case it
> > > > matters, there are no other table indexes but can easily add them if
> > > > it improves performance.
>
> > > > We use 10.2 on a Solaris V245.
>
> > > > Thank you for helping in advance.
> > > > John
>
> > > > ---------------------------------------------------------------------------ญญญญ-----------
> > > > | Id | Operation | Name | Rows | Bytes |TempSpc| Cost
> > > > (%CPU)| Time |
> > > > ---------------------------------------------------------------------------ญญญญ-----------
> > > > | 0 | SELECT STATEMENT | | 33322 | 715K| | 26691
> > > > (7)| 00:05:21 |
> > > > |* 1 | FILTER | | | |
> > > > | | |
> > > > | 2 | HASH GROUP BY | | 33322 | 715K| 21M| 26691
> > > > (7)| 00:05:21 |
> > > > |* 3 | HASH JOIN | | 666K| 13M| | 24919
> > > > (7)| 00:05:00 |
> > > > |* 4 | INDEX RANGE SCAN | IDXE | 22 | 242 | | 3
> > > > (0)| 00:00:01 |
> > > > |* 5 | TABLE ACCESS FULL| TEST1 | 37M| 398M| | 24428
> > > > (6)| 00:04:54 |
>
> > > > PLAN_TABLE_OUTPUT
> > > > ---------------------------------------------------
> > > > Predicate Information (identified by operation id):
> > > > ---------------------------------------------------
>
> > > > 1 - filter(COUNT(*)>10)
> > > > 3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND
> > > > "T0"."ANSWER"="TX"."ANSWER")
> > > > filter("T0"."RESIDENTID"<>"TX"."RESIDENTID")
> > > > 4 - access("T0"."RESIDENTID"=1486674)
> > > > 5 - filter("TX"."RESIDENT"<>1486674)
>
> > > > 22 rows selected.
>
> > > I agree with much of what David Fitzjarrell stated regarding the
> > > source of the problem and how possibly to correct the problem. You
> > > are running Oracle 10g R2, which is performing a hash group by, so
> > > there is no sort required by the GROUP BY. David mentioned the
> > > SORT_AREA_SIZE parameter - while there is no sort indicated by the
> > > explain plan, the HASH_AREA_SIZE by default is set to twice the
> > > SORT_AREA_SIZE, so the parameter does have some relevance to your
> > > case. The index on the RESIDENTID column is preventing two full
> > > tablescans, but still allows one to occur, which results in 22 rows
> > > being hash joined to 37,000,000 rows (average of 11 bytes per row).
> > > The resulting hash join returns 666,000 rows and requires 13MB, and
> > > during the GROUP BY operation 21MB is spilled to the temporary
> > > tablespace to return 33,322 rows.
>
> > > Most of the time is likely spent in returning the 37,000,000 rows
> > > (likely from disk) and for performing the multi-pass hash join and
> > > hash group by. You can set event 10104 generate a trace file to help
> > > determine the effective HASH_AREA_SIZE needed to prevent the hashing
> > > operations from spilling to disk. Your primary key index on the table
> > > is likely RESIDENTID and QUESTIONNUMBER, and it appears that you
> > > created another index on the RESIDENTID column. I believe that
> > > someone else suggested a composite index on the QUESTIONNUMBER and
> > > ANSWER columns - that appears to be a very good idea to reduce the
> > > 37,000,000 rows that are entering the hash join (only 2% of the rows
> > > would be passed on to the hash join).
>
> > > Charles Hooper
> > > PC Support Specialist
> > > K&M Machine-Fabricating, Inc.- Hide quoted text -
>
> > > - Show quoted text -
>
> > It may be that the index OP already has in place (residentID
> > +questionNumber+answer ) is the best as table access will be
> > avoided ... I would test both options and see which one performs
> > better with his data. It may also help to create an MV for
> > SELECT questionNumber, answer
> > FROM testTable
> > WHERE residentId = 1486674
>
> I overlooked that the OP created an index on RESIDENTID,
> QUESTIONNUMBER, and ANSWER columns. It is true that the index helped
> avoid a table access of up to 80 rows, but those 100 rows are very
> small compared to the 37,000,000 rows that are being returned from the
> TEST1 table. Also true that Oracle _could_ use an index skip scan, if
> forced to do so, to make use of the IDXE index to avoid retrieving the
> 37,000,000 rows. As David Fitzjarrell pointed out, the OP is asking
> Oracle to compare with all 3,000,000 people except one, so that would
> likely be a very inefficient access method considering that the
> average row size appears to be about 11 bytes (many rows retrieved per
> each 8KB block read).
>
> I would suggest comparing the performance with one index on the
> RESIDENTID column and a second, composite index on the QUESTIONNUMBER
> and ANSWER columns. I am not sure that I understand the materialized
> view suggestion - should the OP create 3,000,000 materialized views -
> one for each possible value of RESIDENTID?
>
> Charles Hooper
> PC Support Specialist
> K&M Machine-Fabricating, Inc.- Hide quoted text -
>
> - Show quoted text -
I've already suggested a different query - see above - which could
take advantage of an MV, sorry for not giving the full explanation in
one single step. So, the query can be written as:
SELECT residentId, count(1)
FROM testTable
WHERE ( questionNumber, answer ) IN ( SELECT questionNumber, answer
FROM testTable WHERE residentId = 1486674 )GROUP BY residentId
Now, the index on all 3 columns could help, as all the OP requires is residentId, so it may/should be faster to read only from index as opposed to table access, but again that depends on his data - i.e. I do not have the means to test it for him. Received on Fri Mar 16 2007 - 09:30:44 CDT