| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> comp.databases.theory -> Keyword searches
Hi,
I'm in the process of hacking up my own Gnutella client. For this, I need a function which returns a set of files that matches a certain search query (which is a set of keywords). Each file I share with the world also has a set of keywords associated with it. Now I need to match this list against the query efficiently.
My current approach would be the "bitmask" method, i.e. to hash these keywords to a fixed-length bitmap by associating certain character combinations with bits in the mask, setting them if a character combination appears in the hashed string. Then if (query_mask & file_mask) == query_mask I examine the file further to see if it really matched.
My questions:
Simon
File layout:
Part 1: Search tree
struct node {
long offset_if_zero;
long offset_if_one;
}
If one of the offsets is zero, then there is no data that matches. If the offset is negative, (-offset)-1 points into part 2 of the file.
Part 2: Lists of nodes with equal hashes
-> Zero-terminated lists of offsets into part three.
Part 3: Data
-- GPG public key available from http://phobos.fs.tum.de/pgp/Simon.Richter.asc Fingerprint: DC26 EB8D 1F35 4F44 2934 7583 DBB6 F98D 9198 3292 Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!Received on Sun Jul 15 2001 - 15:31:50 CDT
![]() |
![]() |