Re: fast wildcard searching on millions of strings
Date: Sat, 11 Aug 2001 23:25:05 +0200
Message-ID: <vaf66buz3v2.fsf_at_lucy.cs.uni-dortmund.de>
rsit_at_ucsd.edu (rsit) writes:
> Hey I was wondering if some of you guys could help me on a problem I
> am having. I am working on a searching engine that will be primarily
> doing wildcard searching on a set at least 33 million URLs. The
> problem I am having is figuring out a underlying architecture that
> would support wildcard searching on this set in hopefully less than
> one second.
Are you going to search in the URLs or in the documents that are behind the URLs? If it's the documents, then I think wildcard searching is not the way to go. Rather, you want stemming or maybe phonetic similarity search. Sounds like Information Retrieval.
There is OpenText which uses the Pat algorithm for regexp searching in large corpora.
kai
-- ~/.signature: No such file or directoryReceived on Sat Aug 11 2001 - 23:25:05 CEST