Re: implementing a index for wildcard text search

From: Jerome H. Gitomer <jgitomer_at_erols.com>
Date: Tue, 09 Mar 2004 23:53:20 -0500
Message-ID: <404e9f37$0$3093$61fed72c_at_news.rcn.com>


Achim Domma wrote:
> Hi,
>
> we have implemented our own, specialized data storage. We need now the
> possibility to search on strings with wildcards. Can somebody give me a
> starting point on how to implement such an index?
>
> regards,
> Achim
>
>
Okay,

   The technology I am about to describe is over 40 years old, but when it was proposed storage was so expensive that it wasn't feasible until recently. Two steps are required. First, you have to build the index used for your string searches and second, you have to build your search engine.

   Fully invert your database, that is, build an index that applies to all but the most common words in the language you are using. (In english for example there are approximately 55 words such as a, the, I, and, or that are excluded from the index.) This will more then double the size of your database.

   Build a regular expression engine, better yet, find one that already works, download the source, modify it to suit your needs and compile it.

HTH
Jerry Received on Wed Mar 10 2004 - 05:53:20 CET

Original text of this message