Re: Multidimensional query involving ranges

From: Paul Keister <remove-this-keister_at_dnai.com>
Date: Sat, 21 Jul 2001 23:27:31 GMT
Message-ID: <9f4hvl$dbt$1_at_bob.news.rcn.net>

The solution I've heard of for performing a multi-dimension search is to use binary space partition (BSP) trees. Before I go any further, I should disclaim that this is not really what you asked for. It is not directly supported by a commercially available DBMS that I am aware of. Also, you'll have to build and maintain the BSP index with your own code, which will probably be less efficient that using built-in indexes. There's a lot I'm not aware of, so perhaps there is an OLAP product out there that will help you without having to bother coding the following cool solution:

Basically, what BSP trees allow you to do is collapse an N dimensional index into a 1 dimensional index. A BSP index works like a binary tree, in that each level of the tree partions the range of values to a finer degree. The twist is that in a BSP tree, the index is rotated to a new dimension with each partition.

For example, in a 2-D BSP tree, you first partition across the X dimension evenly, using the mean value for X out of all the coordinates you are trying to map. The two X partitions become 2 child nodes of the root of a binary tree. Then, you rotate to the Y axis and partition each of your X coordinate partitions according to the mean value of Y coordinates of the points that are contained within them, creating 4 partitions with approximately equal populations. Then, return to the X to make 8 partions, to the Y for 16, etc cetera. building the binary tree as you go.

When you've reached the desired level of granularity, you can assign the points you are mapping to the leaf nodes of the tree. Now here's the cool part: you've just collapsed a 2 dimensional map into a 1 dimensional ordering. You can traverse the tree in order and create a 1 dimensional index that will effectively represent 2 dimensional proximity. This 1 dimensional index can be stored as an indexed column of an RDBMS.

The process works for N dimensions. If you're up for coding this solution and don't mind the overhead of rebuilding your index yourself, you might want to check out BSP trees.

"Ian Barrodale" <ian_at_barrodale.com> wrote in message news:9f3eul$2kte$1_at_casper.UVic.CA...
> After doing some reading on querying multidimensional databases, it seems
to
> me that there are two basic approaches to performing a multidimensional
> query involving ranges:
> 1. use an index on the column with highest selectivity and suffer a
> sequential scan through the result set to find the records that meet the
> other criteria, or
> 2. turn the range query into an equality query by computing "bin" columns
> and then querying these columns instead of the original ones
>
> As an example of the second alternative, instead of "select ... where X
> between 100000 and 200000 and Y between 400 and 500" we might say "select
> ... where Xbin = 100000 and Ybin = 400" . Here we've precomputed Xbin and
> Ybin as Xbin = int(X/100000)*100000 and Ybin = int(Y/100)*100, and we've
> built a composite index on (Xbin,Ybin)
>
> Obviously in theory both approaches have problems: option 1 breaks down if
> there are lots of records in the record set still remaining after the
index
> is used, and option 2 requires that you understand not only the data
> distributions but also the granularity expected by the user. And if the
> data is reasonably dynamic you're not going to want to suffer the update
> time entailed in recomputing the bin columns (even if this is done
> automatically through triggers). But are these approaches "good enough"
in
> the real world? Are there examples of real-world applications where
these
> approaches do not work? What other approaches are used for large,
> reasonably dynamic databases? Are any of these approaches offered by
> commercially available RDBMS products?
>
>
>
Received on Sun Jul 22 2001 - 01:27:31 CEST

This message: [ Message body ]
Next message: Paul Keister: "Re: Uniqueness, NULLs and design options"
Previous message: Steve Long: "Re: db design for measurement data?"
In reply to Ian Barrodale: "Multidimensional query involving ranges"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message