Re: SQL Optimization

From: Magnus Lonnroth <mloennro_at_se.oracle.com>
Date: Thu, 11 Nov 1993 11:37:58 GMT
Message-ID: <MLOENNRO.93Nov11113759_at_demo1.se.oracle.com>


>>>>> Regarding Re: SQL Optimization; bainbridge_i_at_perc03_at_bgers.co.uk (Ian Bainbridge) adds:

  IB> In article <657_at_ncrcetc.WichitaKS.NCR.COM>, gkeim_at_ncrcetc.WichitaKS.NCR.COM (Gail Keim) writes:
>From: gkeim_at_ncrcetc.WichitaKS.NCR.COM (Gail Keim)
>Newsgroups: comp.databases.oracle,ncr.databases.oracle
>Subject: SQL Optimization
>
>I am trying to optimize the following SQL statement:
>
> select * from part, prod_family
> where part.prod_family = prod_family.prod_family
> and prod_family.description like :Z
> and part.param_1 between :X and :Y
>
>The difficulty arises because PART.PARAM_1 is a varchar column,
>and depending on the type of parameter passed, it may be inter-
>preted as character or numeric (as in the above example). When
>Oracle parses this statement, it does the to_number conversion
>on PARAM_1 first, and fails with Oracle error 01722, even though
>the clause PART.PROD_FAMILY = PROD_FAMILY.PROD_FAMILY should
>filter out only numeric values for PARAM_1.

Yes, and this is probably bad news. When comparing character and numeric data, Oracle will always convert character-data to numeric if an implicit conversion is required. If your comparing an indexed character column to a numeric type, you'll never be using the index. In your example, Oracle has no way of knowing the datatype of bind-variables :X and :Y at parse-time, so it assumes the best (they're probably same type as column), which will give you a value error during execute. Also, Oracle has no way of knowing if bind-value :Z will be preceded with a wildcard % or not. If it isn't, Oracle can use an index, but if it is, an index is not of much use. Again, Oracle assumes the best, and chooses to consider a non-unique index-range-scan for this predicate. This is bad news if your user passes some %-preceded string as a parameter, since scanning the index is usually extra overhead. Of course, you can also turn this around and consider it as a feature for tricking Oracle into using an index for LIKE '%...' predicates.

When Oracle constructs an execution-plan for this statement, it must decide which table to start with. For rule-based optimization, it will choose the table with the highest-ranking where-predicate (unless one table can only be accessed by full-table-scans, in which case it will be chosen). In Oracle7, cost-based optimization will include index and table statistics to help pick the best plan. In Oracle7, you can also pass "hints" to the optimizer. For example, you can force the optimizer to construct a plan, where the tables are joined in the same order as they are listed in the FROM-clause. Please read more about optimizer-hints in the generic manual about application development (can't recall it's exact name).

To sum things up: You should consider *not* using bind-variables, and construct the whole WHERE-clause dynamically instead. This will cause a reparse every time your statement is executed, but you will fix both problems mentioned above. You probably know how this *should* be performed (just imagine yourself with a part and prod_family card-file), which would you start with ? Verify that oracle is doing the same by using sql_trace and tkprof.

>
>I can make this work by doing a subquery:
>
> select * from part
> where part.part_no in
> (select part_no from prod_family
> where prod_family.description like :Z)
> and part.param_1 between :X and :Y
>
>but with a considerable performance hit (from about 1 sec.
>to 3 minutes).
>
>Any ideas? Please send comments to gail.keim_at_WichitaKS.NCR.COM.
>
>Thanks!
>

  IB> One or two thoughts spring to mind :-

  IB> Try putting the line part.prod_family = prod_family.prod_family LAST   IB> in the where clause

This is a myth. It is also bad programming style. I've visited customers who have written all there SQL backwards due to this. There was once (in rdbms v5) a rule stating that if multiple WHERE-predicates referring to multiple indices on the same table ranked equally, Oracle had no way of choosing one, so it picked the index, which was referenced last in the WHERE clause. In rdbms v6, this was first changed to choosing the index which was referenced first, and in later releases of v6, it was changed again to choose the index that was created first (or was it last?). Anyways, don't rely on it! Make your SQL readable instead. It's much better practice to disable unwanted indices by adding 0 or concatenating with '', as in: WHERE INDEX_COLUMN + 0 = CONSTANT.

  IB> Never use IN on a subselect. Always use where EXISTS instead. The performance   IB> improvements from this alone may solve your problem.

This is also a myth. In most cases, especially where large tables are involved, using IN is much faster. This is because EXISTS usually forces you to write a correlated subquery - one that is executed once for every candidate row in the outer query-block. This is particularly true for DML statements:

delete from big_table
where index_column in ( select ... from other table where ... )

is much faster than

delete from big_table
where exists ( select null from other table

               where join_column = big_table.index_column )

because the first example will quickly locate rows in big_table by using an index, but the second example will scan the whole table, and for each and every row, perform subquery.

Here's another example:

update big_table
set column = ( select column from feeding_table

               where index_column = big_table.index_column ) where index_column in ( select index_column from feeding table

                        where index_column > 0 )

is much faster than

update big_table
set column = ( same as above )
where exists ( select null from feeding_table

               where index_column = big_table.index_column )

for the same reasons as above. Even if the feeding_table is very large, the penalty for not using an index to select rows in big_table for updating is often much greater (compared to creating a temporary segment with index_columns. Please note that the opposite can be true for small tables!

  IB> My understanding of this (somebody correct me if I am wrong)

  IB>  IN  - EVERY value is checked before a true or false is returned EVEN if the 
  IB>        first value found matches the selection.
  IB>        This is particularly relevant is the subselect hits a large table.

  IB> EXISTS - returns true as soon as a matching value is found   IB>

  IB> Hope this helps or at least gives you a pointer to another direction

  IB> Ian

  IB> -- 
  IB> +--------------------------+-----------------------------+---------------------+
  IB> | Ian Bainbridge           |Working Philosophy:          |Sensible Advice :    |
  IB> | bainbridge_i_at_bgers.co.uk |                             |                     |
  IB> | British Gas ERS,         |Just trying to get to Friday |Don't Eat The Yellow |
  IB> | Newcastle Upon Tyne, UK  |with as little harrasment as |Snow  - Frank Zappa? |
  IB> | Phone: 091-216-0202      |possible                     |                     |
  IB> +--------------------------+-----------------------------+---------------------+
--

Magnus Lonnroth
Tech.Sales & Consultant
Oracle Sweden
Mail: mloennro_at_oracle.com
Received on Thu Nov 11 1993 - 12:37:58 CET

Original text of this message