Re: Efficiency and usage of IN operator

From: BPMargolin <bpmargo_at_ibm.net>
Date: Sun, 30 May 1999 19:44:53 -0400
Message-ID: <3751cd8a@news3.us.ibm.net>

Alexander,

A few thoughts, though possibly not the answer you are seeking.

First the quote is from the SQL Server 6.x documentation. The SQL Server 7.0 Books Online does not appear to have a similar statement.

It is incorrect to assume "the query will return _fewer_ rows". If it did that then SQL Server would not be returning a correct result set.

Predicates with large numbers of values in the IN clause, or equivalently, with large numbers of OR clauses are going to be inefficient because, ***if*** an index is used, more rows are going to have to read to both in the index and then in the data. I emphasize the ***if*** because, it is possible that SQL Server will simply decide that, if the number of conditions is inordinately large, fewer total I/O's will result from simply doing a table scan.

SQL Server uses a fairly sophisticated cost-based optimizer, both in SS6.x and SS7.0 (not surprisingly the SS7.0 is superior to the SS6.x). While I would not say, because it isn't true, that the SQL Server optimizer will always come up with a better query plan than a human, it is true more often than not.

Bottom line, as a standard rule (and I have never known of an exception to this rule), given a choice between IN and multiple OR's, I always go for the IN, simply because it takes less typing without effecting the query plan one way or the other.

Alexander Staubo <earlybird_at_mop.no> wrote in message news:MPG.11bbcae494dced73989681_at_news.mop.no...
> [I'm using MS SQL Server, although I'm interested in information about
> servers (eg. Oracle, DB2) as well.]
>
> Given a statement such as...
>
> select ids from customers
> where parent_ids in (35, 203, 495, 502, 945)
>
> ...how efficient is such a query compared to say, using a separate OR
> test for each value ("parent_id = 35 or parent_id = 203 or ...")?
>
> According to the Transact SQL documentation, "If there are 16 or more
> values in an IN predicate, the values are sorted and binary search is
> used to evaluate the predicate. This can result in limiting the overall
> number of comparisons performed".
>
> This last sentence is ambiguous: Does it mean the IN predicate becomes
> more _efficient_ because a binary search is used, or does it mean the
> query will return _fewer_ rows? From this description it sounds like 16
> is a hard limit on the number of viable comparisons for an IN predicate.
>
> I'm concerned about speed more than anything here. Reducing the number of
> index comparisons is important.
>
> --
> Alexander Staubo http://www.mop.no/~alex/
> "It has taken the planet Earth 4.5 billion years to discover it is
> 4.5 billion years old." --George Wald
Received on Sun May 30 1999 - 18:44:53 CDT