Re: Optomized Range Searching....
Date: 16 Aug 2000 08:40:42 +0100
Message-ID: <u7r97plklh.fsf_at_o2-3.ebi.ac.uk>
Art> That is true. However, it is possible that the span between the low bound Art> and the high bound may be quite large. Thus for example, if someone Art> queried a DB for (as a bad example) pots in Mesopotamia between the Art> years of 100A.D. - 500 A.D.
(it wasn't called Mesopotamia, then, but under I think Roman rule ;-)
Art> This would catch all instances of all dates Art> for 500 years
400 years.
Art> which would have to be OR'ed together.
??? If anything, the thing lower and upper bounds are AND'ed together
(WHERE year >= 100 AND year <= 400).
Art> This, I would Art> presume, could be sped up by having buckets which contained the Art> record IDs for the various dates by century already colated together. Art> Thus you would have 5 buckets to look at and process instead of Art> potentially hundreds.
Indexes are a means to spead up retrieval, and many indexes (including B-trees, but not hash tables or bitmaps) have the useful property of being ordered. That is, knowing the index node for the first entry with YEAR >= A.D. 100 and the index node for the last entry with YEAR <= A.D. 400, you implicitly have all entries in between, because in the index's data structure, they are also in between.
Art> I am sure there are other optomizations out there for this sort of thing Art> but when it comes to numeric range processing you could run into Art> some real efficiency problems if the range between your low and high Art> contain an extremely large number of enteries.
This is of course true for any query, and has nothing whatsoever to do with ranges. Queries along the lines of WHERE sex ='M' will have exactly the same problems, and incidentally, it may be good to point out that as a rule of thumb, you should only disable an index in a query if you expect to get hits in less than around 10-20% of your table in that particular query. If your query is likely to hit more than 10-20% of your table (e.g., WHERE sex ='M'), indexes are pointless, since they slowdown table access.
Philip
-- When C++ is your hammer, everything looks like a thumb. (Steven Haflich) ----------------------------------------------------------------------------- Philip Lijnzaad, lijnzaad_at_ebi.ac.uk \ European Bioinformatics Institute,rm A2-24 +44 (0)1223 49 4639 / Wellcome Trust Genome Campus, Hinxton +44 (0)1223 49 4468 (fax) \ Cambridgeshire CB10 1SD, GREAT BRITAIN PGP fingerprint: E1 03 BF 80 94 61 B6 FC 50 3D 1F 64 40 75 FB 53Received on Wed Aug 16 2000 - 09:40:42 CEST