algorithms for selecting on non-key/non-indexed field

From: dfdf <adamsch1NOadSPAM_at_yahoo.com.invalid>
Date: 2000/06/28
Message-ID: <11957a80.a2bc2174_at_usw-ex0103-086.remarq.com>#1/1


Hi. I've been interested in databases for quite some time and for fun, I'm implementing a simple database in C as a learning tool. I'm using berkeley db as the low level file access primitives. My question:

If your access method is of type heap, then in order to avoid linear scans, you should create seperate, appropriate indexes on the fields you will commonly search on. In my experiment, I created a heap file of about 1 million rows. Doing a linear scan (where each row has about 64 bytes of *my* data) to find a particular row based on a non-indexed field takes approx 18 seconds.

I repeated the experiment in mysql and returning an arbitrary row from selection criteria on a non-indexed, non-key column only took 2 seconds. I don't understand why, in theory, mysql should be doing a linear scan as well.

In practice, mysql is probably doing something clever that I'm not. I would like any comments on the clever bits in particular. Looking at the source for both mysql and msql, it appears that they create in-memory indexes behind the scenes to help you out, or possibly gather statistical information on field contents to get an approximation when doing the select.

I also understand that the more of the table you suck into memory, the better your performance should be. I accounted for that, at least according to the bdb specs by upping the cache size to approximatly the size of my database.

This is somewhat of a badly formed question, so I understand that I may not get any responses!

Shane

Got questions? Get answers over the phone at Keen.com. Up to 100 minutes free!
http://www.keen.com Received on Wed Jun 28 2000 - 00:00:00 CEST

Original text of this message