Re: SQL, Space, and Time

From: <brown_p_g_at_my-deja.com>
Date: Fri, 26 Jan 2001 19:48:00 GMT
Message-ID: <94sk9c$9mq$1_at_nnrp1.deja.com>


In article <9476uh015vu_at_news2.newsguy.com>,   bgsmith01_at_fs.fed.us wrote:
> We in the natural resource management community are ever more
 frequently
> building data systems that need to answer questions that explicitly
> incorporate time and/or space. Questions such as "how has this place
> changed over the last XX years", or "where on the landscape do we find
> these combination of things".

  While this was true at the time when all vendor products adopted SQL-92 as their standard, it's not true today. What the SQL-3 standard defines is less a complete language (with a type system), and more of a framework within which it is up to application developers to embed their class definitions/components/data type implementations. Several DBMS systems research projects have explored efficient techniques for building systems to support such a model.[1][2]

  Two parallel developments are also taking place: standards are emerging that define how to embed new types/classes (SQL-J parts 1 & 2[3], OLE-DB) and DBMS vendors are partnering up with domain specialists (like the spatial vendors) to get the more sophisticated spatial algorithms--among others, like temporal, linear regression analysis, statistical analysis, etc -- embedded into their SQL.[4][5]    In specific terms; I'll use your second question ("where on the landscape do we find these combination of things") as an example. What I'll do is to show you both the SQL and dive in to a little detail on the underlying query processing technology.

   Suppose the "things" of interest to the application are a mixture of fairly fixed geo features (rivers, roads) and more dynamic information (trucks and customers, say) and that the application is all about "how to get parcel from 'A' to 'B'" where the parcel might be something that can't cross a river (like a Vampire in a coffin). Because the attitude pursued by SQL-3 is to employ a relational model, facts about "things" are stored in tables. Thus:

  CREATE TABLE Rivers (

           Seg_Id    SYSTEM_KEY    PRIMARY KEY,
           Label     String        NOT NULL,
           Path      ST_PATH       NOT NULL
  );

 The ST_PATH is a SQL-3 type that looks like the following in string form. Internally, the DBMS formats the individual path data objects in exactly the same way that a GIS server would (as a 'C' structure with a point count and an array of double precision numbers, for example).

  "(-123.43,32,-123.421,43.319,.........-123.44,43.351)"

   Accommodating extremely large spatial objects -- like all the fiddly bits on the coastline of Norway -- requires some complex engineering, but it can be done quite efficiently.

  Other spatial object classes would include ST_POINT, ST_POLYGON, ST_MULTI_POLYGON, and so on. These types can be co-mingled with any of the SQL-92 types with which RDBMS programmers are familiar.

  Being a database, the interface is a little more low-level than a today's GIS systems. Spatial data in a SQL-3 DBMS is retrieved and manipulated using declarative SQL expressions: SELECT, INSERT, UPDATE and DELETE through an interface like JDBC, ODBC, or ESQL/C. Graphical manipulation of the data is, of course, still possible, but in such systems the GUI events are translated into SQL, and submitted to the back end server.

   For example, suppose the position of a truck was updated automatically from a transponder. Such an update would look like this:

   UPDATE Trucks

      SET Where = :New_Location
    WHERE Id = :Truck_Identifier;

  All of the transactional guarantees that RDBMS products make over SQL-92 data apply to the geographic/temporal data too.

   Now, suppose the query is "show me trucks within 30 miles of 'X' where there is a road from the truck to 'X' that does not cross a river?"

   SELECT T.Id, T.Where, D.RoadWay,

          Distance ( T.Where, Location_X_Y_of_Customer )
     FROM Trucks T, Roads D
    WHERE Contains ( ST_Circle ( T.Where, "30 MILES" ),
                     :Location_X_Y_of_Customer )
      AND Overlaps ( D.RoadWay, ST_Circle ( T.Where, "0.1 MILE" ) )
      AND Overlaps ( D.RoadWay,,
                     ST_Circle ( :Location_X_Y_of_Customer, "0.1 MILE")
                   )
      AND NOT EXISTS ( SELECT 1
                         FROM Rivers R
                        WHERE Overlap ( D.RoadWay, R.RiverBed ) )
    ORDER BY 4 ASC;   This example is totally from memory of the standard, and the expression names might be a bit different. Also, the SQL-3 standard advocates a syntax that is more heavily influenced by OO languages than functional programming. So it's

  D.RoadWay.Overlaps(

          ST_Point(:Location_X_Y_of_Customer).ST_Circle("1 MILE")   )

  Implementations can cater to questions like projections and ellipsoid specifications.

  Making this kind of thing efficient requires spatial indexing, and that turns out to be tricky. Over the last 10 years, a number of spatial indexing (and other spatial data management techniques) have been proposed: R-Trees, KD-B Trees, being among the most popular[6]. These techniques are preferred by DBMS vendors to traditional Quad-Tree and Space-Filling Curve techniques for a number of technical reasons relating to transaction support, and query processing (R-Trees and KD-B trees are height balanced, and have reasonably obvious recovery algorithms, as well as good worst-case performance).

  To be at all practical, the query above would need to make extensive use of spatial indexing techniques.

  Managing time and geography in single systems is a subject of considerable research interest[7]. In a nutshell, spatial and temporal properties of real-world phenomenon can be easily accommodated within a modern DBMS in one of two ways. Either by introducing a set of types that grok spatio-temporal locations as encapsulated objects (where and when), or by combining a set of spatial extensions with a set of temporal extensions using the query language (much as numbers and letters are combined in RDBMS employee records). Both approaches have strengths and weaknesses, but that is the subject of a much longer post.

  In the case of some vendors (and I won't name names: please do your own research) the code that computes things like Overlap() and so on are, in fact, exactly the same code that is used in a mainstream GIS vendor's product. By making their products "extensible", the DBMS vendors can introduce this third party code into their own run-times in much the same way that a device manufacturer can introduce device drivers into an operating system kernel.

  Anyway, the point is that there is considerable interest in these applications on the part of mainstream DBMS vendors. I'd suggest that you do a little research: Google knows everything.

  • REFS ====
[1] Stonebraker, M., Brown, P. and Moore, D. "Object-Relational DBMS: Tracking the Next Great Wave: Second Edition". Morgan Kaufman. CA 1998. ISBN: 1-55860-452-9. http://www1.fatbrain.com/asp/bookinfo/bookinfo.asp?theisbn=1558604529&vm =

[2] Schwarz, Peter et al. "Extensibility in the Starburst Database System". 1986 International Workshop on Object-Oriented Database Systems. IEEE.

http://www.informatik.uni-trier.de/~ley/db/conf/oodbs/SchwarzCFLMMP86.ht ml

 Also, for more information on Starburst:

 http://www.almaden.ibm.com/cs/starwinds/sbbibl.html

[3] http://www.sqlj.org/

[4] http://spatial.informix.com/

[5] http://www.iiug.org/software/index_ORDBMS.html

  Specifically, have a look at the Period README.

[6] Gaede, Volker, and Gunther, Oliver. "Multidimensional Access Methods" ACM Computing Surveys, 1997.

  http://citeseer.nj.nec.com/56577.html

[7] Guting et al. "A Foundation for Representing and Quering Moving Objects". ACM Transactions on Database Systems. Vol 24, #1.

http://www.acm.org/pubs/contents/journals/tods/2000-25/#1

Sent via Deja.com
http://www.deja.com/ Received on Fri Jan 26 2001 - 20:48:00 CET

Original text of this message