Re: algorithms for selecting on non-key/non-indexed field
Date: 2000/07/01
Message-ID: <00822ad8.72ac498c_at_usw-ex0103-086.remarq.com>#1/1
Marc Tardif <intmktg_at_Gloria.CAM.ORG> wrote:
>> If your access method is of type heap, then in order to avoid
>> linear scans, you should create seperate, appropriate indexes
on
>> the fields you will commonly search on. In my experiment, I
>> created a heap file of about 1 million rows. Doing a linear
>> scan (where each row has about 64 bytes of *my* data) to find
a
>> particular row based on a non-indexed field takes approx 18
>> seconds.
>
>There are various ways of representing a heap, such as a tree
or an array
>which lend to very different implementations. As for a linear
scan, be it
>for a straight text file or a database, there are many
algorithms for this
>seemingly simple task: brute force, Knuth-Morris-Pratt, Aho-
Corasick,
>Shift-Or, etc. Furthermore, your first paragraph mentions
Thanks. I'm not well read in theory outside the courses I had
to take in college. I've put these aside and will do some more
research.
Berkeley DB
>which only supports key/data pairs whereas your second
paragraph mentions
>"fields" which are only supported by more advanced relational
databases.
Sorry: To be precise, my "table" consisted of rows that
internally are represented by a C-structure that contains a
couple of integers. One of the ints in the structure was what
BDB was using as an index. My selection criteria was off the
other int (which I referred to as a field), resulting in my
brute force method for searching.
>As you can understand, I am very confused at this point. Be
more precise.
>
>> In practice, mysql is probably doing something clever that I'm
>> not. I would like any comments on the clever bits in
>> particular. Looking at the source for both mysql and msql, it
>> appears that they create in-memory indexes behind the scenes
to
>> help you out, or possibly gather statistical information on
>> field contents to get an approximation when doing the select.
>
>Congrats on being able to understand the code of both mysql and
msql. You
>must've noticed that msql abuses memory usage quite a bit,
making it
>invariably quite fast but only until you start running out of
memory. If
>you want to use Berkeley DB, I'd advise against trying to
simulate
>relational databases (if mysql and msql qualify), but rather to
>concentrate on the task at hand and using the proper access
methods for
>the proper job.
>
>> This is somewhat of a badly formed question, so I understand
>> that I may not get any responses!
>
>Since you're so keen on databases, consider using a linked-list
of
>questions instead of your previous randomness. Start from the
root node
>with a simple question which you are more likely to have
answered. Then,
>progress down your list of questions which could change as you
receive new
>answers... that could qualify as a self-adjusting data
structure, grounds
>for a long thread of discussion.
There ya go. I'll start out with one question. Technically though wouldn't it be a tree instead of a list? =)
>
>
>
Got questions? Get answers over the phone at Keen.com.
Up to 100 minutes free!
http://www.keen.com
Received on Sat Jul 01 2000 - 00:00:00 CEST