RE: strategies for optimizing 'like' operations (shortened to list limit)

From: Mark W. Farnham <>
Date: Fri, 7 Mar 2008 15:52:24 -0500
Message-ID: <02ca01c88095$25e55c90$>

From: Mark W. Farnham []  

I see from the thread you've rejected the notion of Oracle Text (nee Context).  

Okay, so there are a few problems with the general form of this query:  

First, you write a.keyword and b.keyword, implying table aliases and a join down the pike. I *hope* what you meant was <keyword column name a>, <keyword column name b> as formal parameters in the form of the query you're trying to represent and that we're talking about several columns from a single table where you want the row back if you get a hit on any of them (ergo the "or"). But OH! I see you wrote that you do mean you have several tables. HMM. You'll definitely be better off at least paring down the sizes for the cartesian product. So for each row source, you'll want a virtual table (or inline view or whatever other name is in vogue today) of the form:  

(select <column_list> from a, b, ., z


        (upper(a.<searchcol<a1>>) like '%' || upper(:bm) || '%' or

         upper(a.<searchcol<a2>>) like '%' || upper(:bn) || '%' or




         upper(a.<searchcol<az>>) like '%' || upper(:bz) || '%')

       and a.<joincol> = b.<joincol> [for all join predicates that reduce the intersection for your joined tables, any of which (tables) should be omitted from each particular inline view if they do not reduce the returned set]


with a similarly constructed inline view for each table having at least one search column,

then finally anding together the join predicates amongst the tables. You'll still have to scan the bits of the intermediate result set and possibly have internal results sets bigger than you'd like, but at least you'll avoid some really horrible combinations.  

In some cases you make have to force production of the intermediate result sets with the "rownum" trick. I think what I wrote reasonably decodes into a formal query only one way, but I could be wrong.  

Second, putting the upper on the keyword is going to make it impossible to use the index unless you've got a functional index defined or the form is such that a fast full scan of some index gets what you need.  

Third, you're still not going to use the index (except possibly the whole index), because you have a leading wildcard. Think about that a moment... where could you look in a phone book if you don't have the first letter of the string in hand?  

Sigh, you might be better off to grep (or perl) for blocks containing the string and pass back a list of block ids to search with the sql. Of course you'd have to decode the block address and generate either a temporary table to list them or generate a literal in list and make that part of the query. Oracle folks will have kittens if you're directly scanning database files in that manner, and it would be a lot of work to get it right.  

That's the end of my thoughts on the matter, at least for now and probably forever on a pro bono basis. Arbitrary case text search efficiently in sql is hard, ergo the product to do it. Tossing in joins gives you a great chance to go hugely wrong, and if any of your columns are potentially very long, each row can get pretty expensive.  

Sorry for the bad news. I hope someone on the list knows of a breakthrough of which I am ignorant.  




Received on Fri Mar 07 2008 - 14:52:24 CST

Original text of this message