Re: fast wildcard searching on millions of strings

From: rsit <rsit_at_ucsd.edu>
Date: 11 Aug 2001 17:35:26 -0700
Message-ID: <381ddd3b.0108111635.38699f34_at_posting.google.com>


Thanks for you post. It's just a large list of domain names and URLs that I need to search on. You can think of it as a bunch of strings with an average length of 18 characters. I am not searching or doing anything with the documents at the URLs. I would like to put wildcards(* or ?) anywhere on the query string.

I tried to find information on OpenText, couldn't find it yet. Do you have any links that could point me in the right direction?

Thanks a lot,
Ryan Sit

Kai.Grossjohann_at_CS.Uni-Dortmund.DE (Kai Großjohann wrote in message news:<vaf66buz3v2.fsf_at_lucy.cs.uni-dortmund.de>...
> rsit_at_ucsd.edu (rsit) writes:
>
> > Hey I was wondering if some of you guys could help me on a problem I
> > am having. I am working on a searching engine that will be primarily
> > doing wildcard searching on a set at least 33 million URLs. The
> > problem I am having is figuring out a underlying architecture that
> > would support wildcard searching on this set in hopefully less than
> > one second.
>
> Are you going to search in the URLs or in the documents that are
> behind the URLs? If it's the documents, then I think wildcard
> searching is not the way to go. Rather, you want stemming or maybe
> phonetic similarity search. Sounds like Information Retrieval.
>
> There is OpenText which uses the Pat algorithm for regexp searching in
> large corpora.
>
> kai
Received on Sun Aug 12 2001 - 02:35:26 CEST

Original text of this message