Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Usenet -> comp.databases.theory -> Re: implementing a index for wildcard text search

Re: implementing a index for wildcard text search

From: Mike Preece <>
Date: 10 Mar 2004 19:16:28 -0800
Message-ID: <>

"Achim Domma" <> wrote in message news:<c2kfef$9qm$02$>...
> Hi,
> we have implemented our own, specialized data storage. We need now the
> possibility to search on strings with wildcards. Can somebody give me a
> starting point on how to implement such an index?
> regards,
> Achim

I wrote something like this in the late eighties. It was implemented at a couple of sites. Where an index was required there was an index file, a throwaway file, and a controlling parameter record. Each string (name & address in one implementation and full product description in another) was broken down into all possible left-to-right substrings ("THE" gave "T", "H", "E", "TH", "TE", "HE" & "THE", for example). This code (subroutine) was called on every update and as required for index lookup. On each update, an attempt was first made to find each substring in the throwaway file - if found the index-update subroutine call for that substring simply exited (returned). Next the index file item was read and the item-id was added to the associated list of item-ids (primary keys). If the resulting string exceeded the maxlen parameter setting for that index file the index item for the substring was deleted and an entry was added to the throwaway file for that substring instead. This meant that meaningless strings (such as "THE") were automatically discarded - thrown away. If the index file became too big over time you simply changed the maxlen parameter. The throwaway list was self-regulatory - depending on the application. In addition to index items for each substring, a soundex equivalent was also maintained in the same way. Worked well. Users liked it a lot. I think I still have the source code on 1/4" tape somewhere.

Mike. Received on Wed Mar 10 2004 - 21:16:28 CST

Original text of this message