Re: High Speed Text Searching Algorythms...

From: smile773 <smile773_at_bigfoot.com>
Date: 2000/07/23
Message-ID: <V4se5.4043$Uo6.202073_at_bgtnsc07-news.ops.worldnet.att.net>#1/1


I find what I am looking for when I find the right search engine. This would suggest classification, the lexical properties are less important than what I want to do with the information with respect to the subject otherwise it's just an index.

People who are diligent in a particular field could make an outstanding search base for just that field. Finding that search base is what slows down my ability to search.

chicken cook it oriental style <enter> kung foo's recipee search engine

source code database delphi <enter>
community borland code central search engine

apple + chicken <enter>
20 % off book at Amazon.com

Derrick Coetzee <dc_at_moonflare.com> wrote in message news:snk2q5cojjt144_at_corp.supernews.com...
> I know nothing about the reality of the subject myself, but it seems to me
> the best way is probably to assign each page a number, then add the
 numbers
> of all pages containing a certain word to that word's "page list". In this
> way, you can build up a sorted database of words:
>
> chicken 5 1 2 3 6 7
> apple 4 11 5
> monster 15 4 9 2
>
> Then you can treat these as sets... if they specify two words, you find
 the
> intersection:
>
> +chicken +apple
> {4 11 5} intersection {5 1 2 3 6 7} = { 5 }
>
> You can also organize each list according to its relavancy to that
> particular word.
>
> However, this idea works very badly for quoted multiword searches, unless
> you put in entries for each set of multiple words, and that'd get quite
> excessive.
> -Derrick Coetzee
> http://www.moonflare.com/
> P.S. I can see how this isn't C++ specific, but at the same time, few
 other
> languages allow the flexibility needed while still fast enough... if there
> are any they must have some lower language they or their library is
 written
> in...
>
> "Paradox" <sbin_at_mindless.com> wrote in message
> news:39769546.B4DE8DC1_at_mindless.com...
> > As an interesting CS project, I'm looking into different text searching
> > algorythms, everything from simple strncmp to things like
> > Knuth-Morris-Pratt and Boyer-Moore algorythms. But I was wondering, I
> > know company's like Google (and proably other high speed internet search
> > engines) have their own implimentations of search engines. But it seems
> > as though for text searching, those are extreamly fast, proably
> > searching records in the billions in under 1 second. What kind of
> > algorythms do systems like that use? Are there any places online that I
> > can find descriptions of these algorythms?
> >
> > Any reccomendations would be greatly appreciated....
> >
> > Thanks
> > Dave
> > sbin_at_mindless.com
> >
>
>
Received on Sun Jul 23 2000 - 00:00:00 CEST

Original text of this message