Multidimensional query involving ranges

From: Ian Barrodale <ian_at_barrodale.com>
Date: Sat, 21 Jul 2001 23:27:01 GMT
Message-ID: <9f3eul$2kte$1_at_casper.UVic.CA>


After doing some reading on querying multidimensional databases, it seems to me that there are two basic approaches to performing a multidimensional query involving ranges:
1. use an index on the column with highest selectivity and suffer a sequential scan through the result set to find the records that meet the other criteria, or
2. turn the range query into an equality query by computing "bin" columns and then querying these columns instead of the original ones

As an example of the second alternative, instead of "select ... where X between 100000 and 200000 and Y between 400 and 500" we might say "select ... where Xbin = 100000 and Ybin = 400" . Here we've precomputed Xbin and Ybin as Xbin = int(X/100000)*100000 and Ybin = int(Y/100)*100, and we've built a composite index on (Xbin,Ybin)

Obviously in theory both approaches have problems: option 1 breaks down if there are lots of records in the record set still remaining after the index is used, and option 2 requires that you understand not only the data distributions but also the granularity expected by the user. And if the data is reasonably dynamic you're not going to want to suffer the update time entailed in recomputing the bin columns (even if this is done automatically through triggers). But are these approaches "good enough" in the real world? Are there examples of real-world applications where these approaches do not work? What other approaches are used for large, reasonably dynamic databases? Are any of these approaches offered by commercially available RDBMS products? Received on Sun Jul 22 2001 - 01:27:01 CEST

Original text of this message