Re: Search Engine

From: Art Pollard <pollarda_at_lextek.com>
Date: 5 Nov 2002 10:02:09 -0600
Message-ID: <3dc7eaf9$0$76281$45beb828_at_newscene.com>


Stu:

I think you have the right idea to a point ....

What you really need is an "inverted file" instead of a traditional database.
An inverted file basically consists of all the records associated with each keyword in order and adjacent to each other.

For example, "apple", 3, 5, 8, 26, 97, 127, 456, ....

Then when you search it is easy to scan this list for the records that match your query.

A simple way to achieve this is to write out the word to a file followed by the
record number of the record it is in. Do this for each and every word you want to "index." Note: While you are doing this, you will see that they are going
out in record number order. When you are finished you want to sort this file
based on the word as the primary key and the record number as the secondary key.

You will notice that once your sort is complete that your file is now
"inverted"

or in word order rather than record order. (That is why they call it an
"inverted" file.)

You can then further process this file by removing all but the first occurance of each
word giving a list in the form of:

"apple", 3, 5, 8, 26, 97, 127, 456, ....

Now you are ready to begin searching.....

A traditional database has far too many inefficiencies in it to make it appropriate for
doing searches such as you are describing.. At least that is the case if your database
gets to be of any significant size.

-Art

--
Art Pollard
http://www.lextek.com/
Suppliers of High Performance Text Retrieval Engines.
Received on Tue Nov 05 2002 - 17:02:09 CET

Original text of this message