Re: Improving performance of geographic queries

From: Gary Cramblitt <NOSPAM_garyc_at_clark.net>
Date: Sun, 11 Jul 1999 23:16:51 GMT
Message-ID: <3788f2d7.158666405@news.clark.net>

On Sat, 10 Jul 1999 21:15:17 +0100, "Jonathan Lewis" <jonathan_at_jlcomp.demon.co.uk> wrote:

>
>You could either buy in Oracle's Spatial Data Cartridge
>which exists to cope with exactly this type of query,
>or you could simply clone the technology.
>
>Your bounding geo-rectangle is a good starting point,
>but the main trick is to cover the entire space with
>uniform rectangles, then create an 'inverse index'
>table which associates covering rectangles with
>your geo-rectangles (if you want to be really clever
>you could associate the covering rectangles with
>the actual circle for the radio).
>
>You get a two column table:
> radio-id, covering rectanlge_id
> where each radio-id may need a number of covering rectangles.
>
>Index this on covering rectangle_id
>Write a trigger on insert into the radio table to
>generate rows for this table.
>
>Your query is then:
>
> select
> radio detail
> from radio table
> where radio_id in
> (
> select
> distinct radio_id
> from
> two_column_table
> where
> covering_rectangle_id in
> (select (generate(rectangle_ids that cover the radius and
>center of interest))
> )
> and (something to check distances for radios)
>
>
>--
>
>Jonathan Lewis
>Yet another Oracle-related web site: www.jlcomp.demon.co.uk

Thanks for the reply Jonathan. Something like this had occured to me. I wonder what kind of performance improvement this would offer? I'm especially worried about that innermost subselect.

Have you ever done this sort of thing yourself?. I'm wondering what resolution the rectangles should be. I assume that would be a function of the average radio's circle of mobility and the average query. The smaller the rectangles, the better the filter, but the more work Oracle has to do and the more storage space is required. Have you done any or seen any studies? I found a paper at the National Center for Geographic Information and Analysis -- a paper by Yannis, Papadias, Stefanakis, and Sellis titled "Direction Relations and Two-Dimensional Range Queries: Optimization Techniques" (http://www.ncgia.ucsb.edu/pubs/pubslist.html). The paper doesn't exactly address what I'm doing, but it does mention some performance figures for different "minimum bounding rectangle (MBR)" sizes. I'm hoping someone can point me to a more relevant reference, especially in relation to Oracle databases. (They were studying KDB and R trees.)

Another interesting paper divided the Earth into almost uniform triangles. Each triangle was subdivided into four triangles. This was repeated to N levels. The triangles had the interesting property that 1) all the N-level triangles were almost uniform in area, 2) knowing the Id of one triangle, one can easily determine the Ids of neighboring triangles, and 3) the "parent" or containing triangle was simply the Id of a tringle with the last digit dropped. Another useful property of these Ids is that they index simultaneously in two dimensions, north-south as well as east-west.

I'm thinking I could do something similar. I'm thinking that I don't need to use triangles -- they would require too much math, making the trigger too slow.

Say we start by dividing the Earth into 9 rectangles. Number each rectangle from 1 to 9. Divide each rectangle into 9 more rectangles, again numbering the subrectangles from 1 to 9. The Id of each subrectangle is the number of the parent rectangle concatenated with the number of the subrectangle. Keep dividing in this manner for N levels. The rectangles at level N then have an Id consisting of N digits. To get the Id of the X-1th level, drop the Xth digit. For example rectangle 345 is contained in rectangles 34 and 3. Also, given the Id of a rectangle at any level X, we already know the Ids of all the subrectangles at all levels X+1 through N. For example, rectangle 345 contains all the rectangles between 3451 and 3460.

I'm thinking I can somehow make use of these properties to reduce the number of Ids to test against.

Suppose when we subdivide a rectangle we number the subrectangles like this

4 7 9
2 5 8
1 3 6

Consider a two-level numbered area

24 27 29 54 57 59
22 25 28 52 55 58
21 23 26 51 53 56
14 17 19 34 37 39
12 15 18 32 35 38
11 13 16 31 33 36

Suppose there is a record whose rectangular area goes from 14 in the southwest corner to 54 in the northeast corner. Instead of generating a row for every cell the record covers (in this case 16 rows), suppose we took advantage of the fact that the record completely contains rectangle 2. The two_column_table would then contain the following entries

2, 14, 17, 19, 34, 57, 51, 52, 54

Suppose the user's query is a rectangular area that goes from 16 in the southwest to 56 in the northeast. Again, since this area completely contains rectangle 3, we can represent this area as

3, 16, 18, 19, 26, 51, 53, 56

We are looking for any overlap of the query's rectangle with the record's rectangle. They overlap in several ways

The query's rectangle 3 contains the record's rectangle 34. Both the query and the record have rectangles 19 and 51. The query's rectangle 26 is contained in the record's rectangle 2.

These three matches would occur with the following SQL

where Id between 31 and 40

or        Id in [16,18,19,26,51,54,56]
or        Id in [1,2,3,5]

The first part of the query tests for records completely contained by the query's rectangles that are not 2nd-level Ids, namely Id 3. The second part tests for records having matching 2nd-level Ids. The third part tests for records completely containing each of the query's rectangles.

The third part must be repeated all the way up to level 1 for each Id in the query area. For example, if N were 5, an Id of 34587 would generate

or Id in [3458, 345, 34, 3]

Actually, the 2nd and 3rd parts of the query can be combined into a single "in" clause.

We can generalize any query as follows:

Generate a list of all the rectangle Ids for the query, taking advantage of higher-order rectangles completely contained in the query area.
For each rectangle in the query area not level N, generate a "between" clause.
For each rectangle in the query area at level X, generate an "in" list for the X to 1 levels. Eliminate duplicates. Put the resulting list in an "in" clause.

Now I'm wondering whether this extra complexity would be worth the trouble. Is it better to take the more straight forward approach and give Oracle a single constraint with a long (potentially very long) list of N-level Ids, and storing a lot of rows in the two_column_table, or is it better to generate multiple constraints with mixed levels as I have done above, taking advantage of the "contains" and "contained in" properties and storing fewer rows in the two_column_table?

Maybe it would be better to fork over the money to Oracle for the Spatial Data Cartridge you mentioned??

What do you think?

Thanks again for your help. Received on Sun Jul 11 1999 - 18:16:51 CDT