Re: Help with sql query speed. Explain plan included

From: Valentin Minzatu <valentinminzatu_at_yahoo.com>
Date: 15 Mar 2007 08:06:29 -0700
Message-ID: <1173971189.487654.71760@n59g2000hsh.googlegroups.com>

On Mar 15, 10:52 am, JL19..._at_HOTMAIL.CO.UK wrote:
> On Mar 15, 2:39 pm, "Valentin Minzatu" <valentinminz..._at_yahoo.com>
> wrote:
>
>
>
>
>
> > On Mar 15, 9:14 am, JL19..._at_HOTMAIL.CO.UK wrote:
>
> > > Hello,
>
> > > I have a query about how to improve the speed of our sql and am asking
> > > for advice.
>
> > > My real system is complicated and relates to a call center, automated
> > > options chosen, and time on line using an automated telephone system.
> > > Rather than having to describe all the complicated bits, I have made
> > > up a situation using census data instead because we can all understand
> > > it.
>
> > > I have itemized the points of the made up situation. Here they are.
>
> > > 1. The data is census information.
> > > 2. There are 3 million people in the country.
> > > 3. Each person has to fill out a census form.
> > > 4. There are 80 questions on each census form.
> > > 5. Each question on the census form has an answer that is between 1
> > > and 100.
> > > 6. The country residents do not have to answer every question.
> > > 7. If they do not answer a question, a row in the table is not
> > > created.
>
> > > The information is stored in an Oracle database with the following
> > > structure.
>
> > > residentID questionNumber answer
> > > 1 1 10
> > > 1 2 45
> > > 1 3 67
> > > 1 4 55
> > > 2 1 15
> > > 2 2 56
> > > 2 3 12
>
> > > etc
>
> > > We need a query that that reads "give us the resident ID of any person
> > > that has answered more than 10 questions the same as the resident with
> > > ID number 1486674"
>
> > > Here is our sql.
>
> > > select tx.residentid
> > > from test1 t0,test1 tx
> > > where t0.residentid!=tx.residentid
> > > and t0.questionnumber=tx.questionnumber
> > > and t0.answer=tx.answer
> > > and t0.residentid=1486674
> > > group by tx.residentid
> > > having count(*)>10;
>
> > > The problem with this query is that it only returns 177 rows yet takes
> > > "Elapsed: 00:01:10.75" which is about 70 seconds.
>
> > > The only thing that is important to us is query speed. If the data
> > > needs restructuring to get speed then this is OK. But firstly is there
> > > a better SQL to this without restructuring of data.
>
> > > The plan statistics are calculated on 100% of rows in all tables with
> > > cascade at 2am every morning. They are up to date.
>
> > > Here is my plan. Index idxe has been created with "create unique index
> > > idxe on test1(residentid,questionnumber,answer)". Just in case it
> > > matters, there are no other table indexes but can easily add them if
> > > it improves performance.
>
> > > We use 10.2 on a Solaris V245.
>
> > > Thank you for helping in advance.
> > > John
>
> > > --------------------------------------------------------------------------------------
> > > | Id | Operation | Name | Rows | Bytes |TempSpc| Cost
> > > (%CPU)| Time |
> > > --------------------------------------------------------------------------------------
> > > | 0 | SELECT STATEMENT | | 33322 | 715K| | 26691
> > > (7)| 00:05:21 |
> > > |* 1 | FILTER | | | |
> > > | | |
> > > | 2 | HASH GROUP BY | | 33322 | 715K| 21M| 26691
> > > (7)| 00:05:21 |
> > > |* 3 | HASH JOIN | | 666K| 13M| | 24919
> > > (7)| 00:05:00 |
> > > |* 4 | INDEX RANGE SCAN | IDXE | 22 | 242 | | 3
> > > (0)| 00:00:01 |
> > > |* 5 | TABLE ACCESS FULL| TEST1 | 37M| 398M| | 24428
> > > (6)| 00:04:54 |
>
> > > PLAN_TABLE_OUTPUT
> > > ---------------------------------------------------
> > > Predicate Information (identified by operation id):
> > > ---------------------------------------------------
>
> > > 1 - filter(COUNT(*)>10)
> > > 3 - access("T0"."QUESTIONNUMBER"="TX"."QUESTIONNUMBER" AND
> > > "T0"."ANSWER"="TX"."ANSWER")
> > > filter("T0"."RESIDENTID"<>"TX"."RESIDENTID")
> > > 4 - access("T0"."RESIDENTID"=1486674)
> > > 5 - filter("TX"."RESIDENT"<>1486674)
>
> > > 22 rows selected.
>
> > You can try this:
> > SELECT DISTINCT residentId
> > FROM testTable
> > WHERE ( questionNumber, answer ) IN ( SELECT questionNumber, answer
> > FROM testTable
> > WHERE residentId = 1486674 )
> > AND residentId != 1486674;
> > Also, an index on questionNumber+answer could help, depending on the
> > size of the table and the frequency of the SQL.
>
> Thanks for the help Valentin but this query doesn't check that 10 or
> more questions are the same.
>
> Here is my problem. I have included it again below.
>
> > > We need a query that that reads "give us the resident ID of any person
> > > that has answered more than 10 questions the same as the resident with
> > > ID number 1486674"
>
> Thanks for your suggestion though.
>
> John- Hide quoted text -
>
> - Show quoted text -

Hi John,

I forgot to add that part. Please see below: SELECT residentId, count(1)
FROM testTable
WHERE ( questionNumber, answer ) IN ( SELECT questionNumber, answer

                                         FROM testTable
                                        WHERE residentId = 1486674 )

GROUP BY residentId
HAVING COUNT(1) > 10; Received on Thu Mar 15 2007 - 10:06:29 CDT