Re: algorithms for selecting on non-key/non-indexed field

From: Marc Tardif <intmktg_at_Gloria.CAM.ORG>
Date: 2000/07/02
Message-ID: <Pine.LNX.4.10.10007021100330.4484-100000_at_Gloria.CAM.ORG>#1/1


Aha, I think I finally understood what you mean. To search non-indexed fields using BDB, your best bet would be to use the recno access method. You'll notice the format for this is like a flat text file where each record is on its own line. Because of this simple structure, you can then easily use a quick algorithm like some Boyer-Moore derivative to search through all the records.

Of course, if you need one indexed field, you'll have to use a btree or hash which will point to a record in the recno table. That has a problem of its own considering the key in a recno table might change after deleting. The solution for this is to never delete a record in the recno table and, instead, keep a fifo of the records which you want removed. Then, when you add new records, check the fifo first to see if some records are available for overwriting and, if not, create a new record.

If that doesn't make sense to you, ask away...

On Wed, 28 Jun 2000, dfdf wrote:

> Hi. I've been interested in databases for quite some time and
> for fun, I'm implementing a simple database in C as a learning
> tool. I'm using berkeley db as the low level file access
> primitives. My question:
>
> If your access method is of type heap, then in order to avoid
> linear scans, you should create seperate, appropriate indexes on
> the fields you will commonly search on. In my experiment, I
> created a heap file of about 1 million rows. Doing a linear
> scan (where each row has about 64 bytes of *my* data) to find a
> particular row based on a non-indexed field takes approx 18
> seconds.
>
> I repeated the experiment in mysql and returning an arbitrary
> row from selection criteria on a non-indexed, non-key column
> only took 2 seconds. I don't understand why, in theory, mysql
> should be doing a linear scan as well.
>
> In practice, mysql is probably doing something clever that I'm
> not. I would like any comments on the clever bits in
> particular. Looking at the source for both mysql and msql, it
> appears that they create in-memory indexes behind the scenes to
> help you out, or possibly gather statistical information on
> field contents to get an approximation when doing the select.
>
> I also understand that the more of the table you suck into
> memory, the better your performance should be. I accounted for
> that, at least according to the bdb specs by upping the cache
> size to approximatly the size of my database.
>
> This is somewhat of a badly formed question, so I understand
> that I may not get any responses!
>
> Shane
>
>
> Got questions? Get answers over the phone at Keen.com.
> Up to 100 minutes free!
> http://www.keen.com
>
>
>
Received on Sun Jul 02 2000 - 00:00:00 CEST

Original text of this message