Re: INDEX possible for reverse wildcards?

From: Christopher Browne <cbbrowne_at_acm.org>
Date: 5 Jun 2004 04:52:32 GMT
Message-ID: <2id1sfFivo2nU2_at_uni-berlin.de>


After takin a swig o' Arrakan spice grog, ctcgag_at_hotmail.com belched out:
> "Mikito Harakiri" <mikharakiri_at_iahu.com> wrote:
>> <ctcgag_at_hotmail.com> wrote in message
>> news:20040604183510.277$gk_at_newsreader.com...
>> > Well, you would only need to look at index entries that start with 'w',
>> > '%', or '_'. And if the first character is a 'w' or a '_', you only
>> > need to look at ones where the second is 'w', '%', or '_'.
>>
>> And if the first character happens to be '%', then?
>
> Then you check the rest of it to see if it matches. (What did you expect?)

I think you're missing the point of the question.

The question is whether or not one can construct an index that makes it useful to search parts of the interior of a string.

Supposing we have the search criteria:

 select * from some_table where name like 'Bro%'; -- 1.

That could be (character set dependent) transformed into the query:

 select * from some_table where name >= 'Bro' and name <= 'Brp'; -- 2.

which wouldmake good use of a btree index on NAME to cut down the number of entries searched to a dull roar.

In contrast that sort of transformation cannot be done with

 select * from some_table where name like '%owne'; -- 3.

Unless there's some Rather Clever sort of index involved, the best you can usually do with that is to do a sequential scan across the table, looking for matches.

There is actually a pretty easy way of getting a suitable Clever Index for that scenario. Some databases support "functional indices," and you might set up an index based on some function of the name. Let's say:

 create index reverse_name on some_table(reverse(name));

In that case, the clever thing would be to transform Query #3 into...

  select * from some_table where reverse(name) >= 'enwo'     reverse(name) <= 'enwp';

which could use the "reverse_name" index to do this quickly.

Regrettably, that falls apart if we propose a query like #4...

  select * from some_table where name like '%rown%';

Neither of the "obvious" b-tree indices can help that query.

That being said, there do exist indexing schemes specific to text search that might even be helpful for queries based on extensive wildcarding and/or regular expressions. Patricia Tries or Arrays can be used for such purposes, allowing searches to in effect start in the "interior" of the search criterion.

See _Information Retrieval_ by Frakes and Baeza-Yates [Prentice Hall] for more details.

This is the sort of extension that "text-oriented" database systems add in.

-- 
output = reverse("gro.mca" "_at_" "enworbbc")
http://cbbrowne.com/info/advocacy.html
Rules of the  Evil  Overlord #171.  "I  will not  locate  a base  in a
volcano, cave, or  any other location  where it would be  ridiculously
easy to bypass security by rappelling down from above."
<http://www.eviloverlord.com/>
Received on Sat Jun 05 2004 - 06:52:32 CEST

Original text of this message