Re: algorithms for selecting on non-key/non-indexed field

From: dfdf <adamsch1NOadSPAM_at_yahoo.com.invalid>
Date: 2000/07/02
Message-ID: <06596bf8.998cf2a8_at_usw-ex0104-087.remarq.com>#1/1


Marc Tardif <intmktg_at_Gloria.CAM.ORG> wrote:
>Aha, I think I finally understood what you mean. To search nonindexed
 

>fields using BDB, your best bet would be to use the recno
 access method.
>You'll notice the format for this is like a flat text file
 where each
>record is on its own line. Because of this simple structure,
 you can then
>easily use a quick algorithm like some Boyer-Moore derivative
 to search
>through all the records.
Ya this is what I'm talking about =) I was using Cursors in BDB and running through the set of records using the recno format. The data I was pulling in was cast to my C structure. At that point I did my check to see if this was the right record. The problem was that the cursor takes 18 seconds to run through a million entries. This seems slow to me. Now to comment on your suggestion: You would use a good string-matching algorithm and run through the file looking for a match. Once you've found a match, you need then to figure out what record number you are at, given a worse case situation where you are dealing with arbitrary sized records? It would seem that I wouldn't want to use BDB Cursors, just open the file myself and rip through it?

>
>Of course, if you need one indexed field, you'll have to use a
btree or
Exactly. This is what I was thinking of doing when one explicitly created an index on the table.

>hash which will point to a record in the recno table. That has
 a problem
>of its own considering the key in a recno table might change
 after
>deleting. The solution for this is to never delete a record in
 the recno
>table and, instead, keep a fifo of the records which you want
 removed.
>Then, when you add new records, check the fifo first to see if
 some
>records are available for overwriting and, if not, create a new
record.

This would also seem to be a way of reducing gaps in the file after a heavy set of logical deletes (at say the application layer). Of course this would seem less optimal if you had different sized records. Now to continue with the last comment you made about the FIFO algorithm: Now see I thought that the record numbers where always constant in BDB? In that if I had records 1-10, deleted 6, there would still be records physically numbered 7-10. In otherwords, BDB wouldn't renumber the subsiquent records? I know it has a bit in the docs about renumbering explicitly, but you have to programatically do that, I didn't think BDB did that by default.


Got questions? Get answers over the phone at Keen.com. Up to 100 minutes free!
http://www.keen.com Received on Sun Jul 02 2000 - 00:00:00 CEST

Original text of this message