Path: dp-news.maxwell.syr.edu!spool.maxwell.syr.edu!news-spur1.maxwell.syr.edu!news.maxwell.syr.edu!postnews.google.com!j55g2000cwa.googlegroups.com!not-for-mail
From: eselk@surfbest.net
Newsgroups: comp.databases.theory
Subject: Date range searches
Date: 5 Jun 2006 15:06:07 -0700
Organization: http://groups.google.com
Lines: 34
Message-ID: <1149545167.581096.246690@j55g2000cwa.googlegroups.com>
NNTP-Posting-Host: 68.107.208.170
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
X-Trace: posting.google.com 1149545172 6585 127.0.0.1 (5 Jun 2006 22:06:12 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Mon, 5 Jun 2006 22:06:12 +0000 (UTC)
User-Agent: G2/0.2
X-HTTP-UserAgent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.1.4322),gzip(gfe),gzip(gfe)
Complaints-To: groups-abuse@google.com
Injection-Info: j55g2000cwa.googlegroups.com; posting-host=68.107.208.170;
   posting-account=oPPVBwwAAADWmJl1aPaJJDJcns_YwZRL
Xref: dp-news.maxwell.syr.edu comp.databases.theory:41040

I have a record with a start date and an end date field.  If I want to
find all records which are in the year 2006, what is the best way to
index this data?

I'm not using a client/server database engine, it is a custom flat-file
database.

The SQL for the query would look something like this (if this was an
SQL database):

START_DATE <= #12/31/2006# AND END_DATE >= #1/1/2006#

Lets say the index is by start date.  If every record in the database
starts before 12/31/2006, then my search had to scan the entire index,
even if all of the records end dates were before 1/1/2006.  So my
search takes a long time, even though it doesn't find anything.

If I reverse the index and make it by end date.  If every record ends
after 1/1/2006, then my search had to scan the entire index, even if
all of the records start dates were after 12/31/2006.  So I have the
same problem, just reversed.

I guess if I had an index that would contain multiple keys per record,
one key for each day the record lands on, then the search would be
fast.  However, it would take a long time to build and maintain this
index, because it could have 1000s of keys for one record (update 1000
keys just to save a record!).

My indexes are B-tree format, kind of (with some slight variations).
Is there another index format that works better.  If so, could you help
me understand why?  I found one poster saying an R-tree might work
better, and I read some info about an R-tree, but I can't see how that
would work in my situation.

