Re: High Speed Text Searching Algorythms...

From: R124c4u2 <r124c4u2_at_aol.com>
Date: 2000/07/23
Message-ID: <20000723094921.04037.00000499_at_ng-cq1.aol.com>#1/1


Adam McKee writes:

>For each word "hit", also store the position of that word within the
>document (i.e. it is the n'th word). So if you are searching for "Two
>Words", you would find the documents that contain both words (using
>intersection method you described), then look at word hits for both words
>within those documents, and see if the 2nd word has position (n + 1) (and
>therefore occurs immediately after 1st word). This can be done in haste.

By George, I think he's got it! This would seem to be sufficient to explain the near miraculous things you can do with Deja. The index files created must be truly humungous but I see no better alternative. Anyone have a better idea?  

Note that this requires putting even prepositions and articles in the index. Something I certainly would not do if it were simply a keyword as opposed to a 'string' search. And removing a document would be a nightmare. For something like Deja I would simply leave the index alone and handle document removal by some other mechanism.

Is there any data base anyone knows of where actual removal, as opposed to abaondonment, of old entries is actually a requirement? Received on Sun Jul 23 2000 - 00:00:00 CEST

Original text of this message