Re: Optomized Range Searching....

From: Philip Lijnzaad <lijnzaad_at_ebi.ac.uk>
Date: 16 Aug 2000 08:40:42 +0100
Message-ID: <u7r97plklh.fsf_at_o2-3.ebi.ac.uk>


Art> That is true.  However, it is possible that the span between the low bound
Art> and the high bound may be quite large.  Thus for example, if someone
Art> queried a DB for (as a bad example) pots in Mesopotamia between the
Art> years of 100A.D. - 500 A.D.  

(it wasn't called Mesopotamia, then, but under I think Roman rule ;-)

Art> This would catch all instances of all dates Art> for 500 years

400 years.

Art> which would have to be OR'ed together.

??? If anything, the thing lower and upper bounds are AND'ed together
(WHERE year >= 100 AND year <= 400).

Art> This, I would
Art> presume, could be sped up by having buckets which contained the
Art> record IDs for the various dates by century already colated together.
Art> Thus you would have 5 buckets to look at and process instead of
Art> potentially hundreds.

Indexes are a means to spead up retrieval, and many indexes (including B-trees, but not hash tables or bitmaps) have the useful property of being ordered. That is, knowing the index node for the first entry with YEAR >= A.D. 100 and the index node for the last entry with YEAR <= A.D. 400, you implicitly have all entries in between, because in the index's data structure, they are also in between.

Art> I am sure there are other optomizations out there for this sort of thing
Art> but when it comes to numeric range processing you could run into
Art> some real efficiency problems if the range between your low and high
Art> contain an extremely large number of enteries.

This is of course true for any query, and has nothing whatsoever to do with ranges. Queries along the lines of WHERE sex ='M' will have exactly the same problems, and incidentally, it may be good to point out that as a rule of thumb, you should only disable an index in a query if you expect to get hits in less than around 10-20% of your table in that particular query. If your query is likely to hit more than 10-20% of your table (e.g., WHERE sex ='M'), indexes are pointless, since they slowdown table access.

                                                                      Philip
-- 
When C++ is your hammer, everything looks like a thumb. (Steven Haflich)
-----------------------------------------------------------------------------
Philip Lijnzaad, lijnzaad_at_ebi.ac.uk \ European Bioinformatics Institute,rm A2-24
+44 (0)1223 49 4639                 / Wellcome Trust Genome Campus, Hinxton
+44 (0)1223 49 4468 (fax)           \ Cambridgeshire CB10 1SD,  GREAT BRITAIN
PGP fingerprint: E1 03 BF 80 94 61 B6 FC  50 3D 1F 64 40 75 FB 53
Received on Wed Aug 16 2000 - 09:40:42 CEST

Original text of this message