fast wildcard searching on millions of strings

From: rsit <rsit_at_ucsd.edu>
Date: 10 Aug 2001 21:35:59 -0700
Message-ID: <381ddd3b.0108102035.41250d61_at_posting.google.com>


Hey I was wondering if some of you guys could help me on a problem I am having. I am working on a searching engine that will be primarily doing wildcard searching on a set at least 33 million URLs. The problem I am having is figuring out a underlying architecture that would support wildcard searching on this set in hopefully less than one second.

Some setups I am experimenting with includes using agrep/glimpse or using a database. Grep searchings have been working for smaller sets, but I am unsure that it will be efficient on a much larger corpus. I am hoping indexing with glimpse and perhaps putting the data in memory might help, but I haven't tried it yet. I have also tried using MySQL, but it's performance is even less than grep. I was thinking about trying another more powerful database, but MySQL's performance had been so poor that I am not sure trying other databases may help.

Anyones advise would be much appreciated. Perhaps someone could give me some tips on other paths I could try or some assurance that one of the above ideas might be worth looking into.

Thank you very much,
Ryan Sit Received on Sat Aug 11 2001 - 06:35:59 CEST

Original text of this message