Keyword searches

From: Simon Richter <geier_at_phobos.fs.tum.de>
Date: Sun, 15 Jul 2001 22:31:50 +0200
Message-ID: <Pine.LNX.4.31.0107152159000.2609-100000_at_phobos.fachschaften.tu-muenchen.de>


Hi,

I'm in the process of hacking up my own Gnutella client. For this, I need a function which returns a set of files that matches a certain search query (which is a set of keywords). Each file I share with the world also has a set of keywords associated with it. Now I need to match this list against the query efficiently.

My current approach would be the "bitmask" method, i.e. to hash these keywords to a fixed-length bitmap by associating certain character combinations with bits in the mask, setting them if a character combination appears in the hashed string. Then if (query_mask & file_mask) == query_mask I examine the file further to see if it really matched.

My questions:

   Simon

File layout:

Part 1: Search tree

struct node {

    long offset_if_zero;
    long offset_if_one;
}

If one of the offsets is zero, then there is no data that matches. If the offset is negative, (-offset)-1 points into part 2 of the file.

Part 2: Lists of nodes with equal hashes

-> Zero-terminated lists of offsets into part three.

Part 3: Data

-- 
GPG public key available from http://phobos.fs.tum.de/pgp/Simon.Richter.asc
 Fingerprint: DC26 EB8D 1F35 4F44 2934  7583 DBB6 F98D 9198 3292
Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!
Received on Sun Jul 15 2001 - 22:31:50 CEST

Original text of this message