Re: algorithms for selecting on non-key/non-indexed field

From: Marc Tardif <intmktg_at_Gloria.CAM.ORG>
Date: 2000/07/02
Message-ID: <Pine.LNX.4.10.10007021436160.7884-100000_at_Gloria.CAM.ORG>#1/1


[snip]
> million entries. This seems slow to me. Now to comment on your
> suggestion: You would use a good string-matching algorithm and
> run through the file looking for a match. Once you've found a
> match, you need then to figure out what record number you are
> at, given a worse case situation where you are dealing with
> arbitrary sized records? It would seem that I wouldn't want to
> use BDB Cursors, just open the file myself and rip through it?

Question is, do you really need to know which record number you've reached? It would seem the important thing would be to retrieve the information located at the matched string which would only require to seek forward and backward for the next newlines. Since record numbers are arbitrarily assigned based on the number of records previously entered, there are probably very few situations where you would need this reference. In the event you do need the record number, see my previous post about creating a second index using BDB.

> This would also seem to be a way of reducing gaps in the file
> after a heavy set of logical deletes (at say the application
> layer). Of course this would seem less optimal if you had
> different sized records. Now to continue with the last comment
> you made about the FIFO algorithm: Now see I thought that the
> record numbers where always constant in BDB? In that if I had
> records 1-10, deleted 6, there would still be records physically
> numbered 7-10. In otherwords, BDB wouldn't renumber the
> subsiquent records? I know it has a bit in the docs about
> renumbering explicitly, but you have to programatically do that,
> I didn't think BDB did that by default.

FIFO algorithm? see mkfifo(2), a fifo is more like a file rather than an algorithm.

As for record numbers, here is a brief excerpt from recno(3):   The record number data structure is either variable or   fixed-length records stored in a flat-file format,   accessed by the logical record number. The existence of   record number five implies the existence of records one   through four, and the deletion of record number one causes   record number five to be renumbered to record number four,   as well as the cursor, if positioned after record number   one, to shift down one record.

Hope that clears up the need for a fifo in order to keep the integrity of record numbers in your btree or hash tables index which refer to the recno table.

P.S. I'm having a hard time understanding your questions which are basically sentences which end in a question mark rather than an actual question. Could you please take the time to re-read yourself before posting, some people actually need to understand what you write in order to respond. Received on Sun Jul 02 2000 - 00:00:00 CEST

Original text of this message