Re: fast wildcard searching on millions of strings

From: (wrong string) ßjohann <Kai.Grossjohann_at_CS.Uni-Dortmund.DE>
Date: Sat, 11 Aug 2001 23:25:05 +0200
Message-ID: <vaf66buz3v2.fsf_at_lucy.cs.uni-dortmund.de>


rsit_at_ucsd.edu (rsit) writes:

> Hey I was wondering if some of you guys could help me on a problem I
> am having. I am working on a searching engine that will be primarily
> doing wildcard searching on a set at least 33 million URLs. The
> problem I am having is figuring out a underlying architecture that
> would support wildcard searching on this set in hopefully less than
> one second.

Are you going to search in the URLs or in the documents that are behind the URLs? If it's the documents, then I think wildcard searching is not the way to go. Rather, you want stemming or maybe phonetic similarity search. Sounds like Information Retrieval.

There is OpenText which uses the Pat algorithm for regexp searching in large corpora.

kai

-- 
~/.signature: No such file or directory
Received on Sat Aug 11 2001 - 23:25:05 CEST

Original text of this message