Re: Why is this query slow?

From: joel garry <joel-garry_at_home.com>
Date: Fri, 12 Mar 2010 10:32:45 -0800 (PST)
Message-ID: <62c6b97a-ef65-4c9d-ae38-d4b6dcc37e01_at_t34g2000prm.googlegroups.com>



On Mar 12, 5:46 am, Charles Hooper <hooperc2..._at_yahoo.com> wrote:
> On Mar 12, 5:34 am, vsevolod afanassiev
>
> <vsevolod.afanass..._at_gmail.com> wrote:
> > I have been doing SQL query tuning for a while and in most cases this
> > is relatively straightforward excercise. For example: if a query
> > selects only a few rows from large table then it should be accessed
> > through index, if a query selects more than 10% of the rows the table
> > should be accessed through full table scan. In many cases I was able
> > to reduce number of buffer gets from something like 200,000,000 to
> > less than 1000 per execution by creating indexes and adding a few
> > hints.
>
> The 10% rule is not suitable in all cases.  Last year I posted to this
> group a test case that showed a full table scan was faster when
> selecting 0.06% of the rows in a 100,000,000 row table than was an
> index access when the index had a high clustering factor.  However,
> 10% to 20% is probably an OK tipping point in most cases.

That is stated so much better than some other people who insist you can't use any rule of thumb at all.

>
> > The difficulty I have now is that the query is very simple and as far
> > as I can see there is nothing wrong
> > with plan. Still it is slow.
>
> > The query is
>
> > select * from transactions
> > where tx_date between (date1 and date2) and
> > status = 'AAA' and
> > ID not in
> > (select ID from transaction where tx_date between (date3 and date4)
> > and status = 'AAA')
>
> Columns that are not explicitly defined as NOT NULL might be causing
> performance problems for you when the optimizer is not able to
> transform the SQL statement into a more efficient form.  See this blog
> article:http://hoopercharles.wordpress.com/2010/01/10/the-effects-of-potentia...
> (Notice the change in the execution plan in the above link)
>
> I would be inclined to re-write the SQL statement to look something
> like this:
> select
>   T.*
> from
>   transactions T,
>   (select DISTINCT
>     ID
>   from
>     temp_table) TT
> where
>   T.tx_date between date1 and date2
>   and T.status = 'AAA'
>   AND t.ID=TT.ID(+)
>   AND TT.ID IS NULL;
>

I habitually avoid DISTINCT as it has so often meant unnecessary sorting; but of course, that is a bad habit from the RBO days, it should be proven, or at least attempted, for each appropriate circumstance, as here it would not be unnecessary anyways. Thanks for making us think Charles! I never would have thought of the outer join combined with is null, this may have practical implications for me for things I do similar to the OP. As they are mostly time-pressured oneoffs,  I wasn't bothering with hours of performance tuning for tens of minutes of processing, just living with it. But this gives a glimmer of hope.

jg

--
_at_home.com is bogus.
http://www.itworld.com/security/100320/security-industry-faces-attacks-it-cannot-stop
Received on Fri Mar 12 2010 - 12:32:45 CST

Original text of this message