Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> comp.databases.theory -> Re: fast wildcard searching on millions of strings

Re: fast wildcard searching on millions of strings

From: Kai Großjohann <Kai.Grossjohann_at_CS.Uni-Dortmund.DE>
Date: Sat, 11 Aug 2001 23:25:05 +0200
Message-ID: <vaf66buz3v2.fsf@lucy.cs.uni-dortmund.de>

rsit_at_ucsd.edu (rsit) writes:

> Hey I was wondering if some of you guys could help me on a problem I
> am having. I am working on a searching engine that will be primarily
> doing wildcard searching on a set at least 33 million URLs. The
> problem I am having is figuring out a underlying architecture that
> would support wildcard searching on this set in hopefully less than
> one second.

Are you going to search in the URLs or in the documents that are behind the URLs? If it's the documents, then I think wildcard searching is not the way to go. Rather, you want stemming or maybe phonetic similarity search. Sounds like Information Retrieval.

There is OpenText which uses the Pat algorithm for regexp searching in large corpora.

kai

-- 
~/.signature: No such file or directory
Received on Sat Aug 11 2001 - 16:25:05 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US