Re: Polygon data modelling question

From: Paul G. Brown <paul_geoffrey_brown_at_yahoo.com>
Date: 17 Jan 2002 22:15:21 -0800
Message-ID: <57da7b56.0201172215.32623052_at_posting.google.com>


zhouqingqing_at_excite.com (Qingqing) wrote in message news:<cee59be7.0201171934.4ff22767_at_posting.google.com>...
> paul_geoffrey_brown_at_yahoo.com (Paul G. Brown) wrote in message news:<57da7b56.0201170857.4ed017db_at_posting.google.com>...
> > "John Palmer" <jopalmer_at_mail.vt.edu> wrote in message news:<a24amu$qd$1_at_solaris.cc.vt.edu>...

> If you want to know more about this problem, just use google search
> "Roger Weber" then go to his website and find out a lot of stuffs on
> this problem. However, the practical suggestion is: don't sink into
> this problem if you just doing programming, coz it is really hard. If
> you want to implement an indexing method for your application, try
> VA-file method.

  I've read the VA file papers. Two observations.

  First, there is a significant disjunction between the kinds of workload   queries that the VA-File (as well as numerous other proposed indexing   methods) is designed for and D>>2 space. Which is fine, but much spatial   data -- including the polygon motivating the original example -- have extent   (or size or volume or area). R-Trees simply don't work at D~>6 for anything,   but whenever you have an object with mathematical extent (thing also temporal   periods with begin/end, or a mean and variance of some variant measurement)   at lower dimensions, then an R-Tree is about the only game in town.

  Its a bit like the relationship between hashing and a B-Tree. For queries   with 'equal' as the only predicate, Hashing (at O(1)) outperforms B-Trees   (O(log(n)). But most workloads mix in range queries (>, <, BETWEEN) with   equals. Consequently, as useful as Hashing is, it just hasn't been widely   adopted. (Ingres had it, and look where that got them.)

  Second, if you read the paper carefully you will see that what the authors   have done is to come up with an innovative way of handling approximations   in highly-dimensional feature spaces with continuous axis. But their   strategy for query evaluation is to check each and every approximation.   Consequently, their idea is really more about representation of highly   dimensional data, rather than an access method like an R-Tree (or a KDB-Tree,   or a Grid File).

   Hope this adds some fuel to the fire.

        KR

               Pb Received on Fri Jan 18 2002 - 07:15:21 CET

Original text of this message