Multi-Dimensional Rasters - Large Datasets

From: Seth Northrop <seth_at_northrops.com>
Date: 20 May 2002 12:31:23 -0700
Message-ID: <541e251e.0205201131.4c23c8d9_at_posting.google.com>



We have various testing and characterization applications which ultimately produce what amounts to coordinate based raster datasets.

Each test contains multiple iterations (~1300) of data collection on some 700k data points which are mapped to a coordinate system (ie, coordinate 500x500 has a value (float) for each iteration of the test). The result is some fairly large datasets for each test.

Multiple tests can then be applied to the same object multiple times through time, or, other objects in a similar fashion (ie, we have several objects - each of which we might test 10+ times).

We have basically five views of this data we are interested in:

  1. We are interested in a wholistic view of each test iteration (ie, what happened throughout the object during a specific test iteration)
  2. We are interested in a wholistic view of the entire test's data
    (ie, what happened to this object throughout this test; summary
    statistics)
  3. We are interested in individual coordinates throughout a given test
    (ie, we can place the values of the various iterations throughout a
    test into a curve)
  4. We are interested in wholistic data across multiple tests (ie, how has this object OR this coordinate changed from the first test, to the nth test)
  5. We are interested in comparing performance as indicated within the test data of multiple objects (ie, compare objects based on test data)

To date, we've stored this data in flat files with some summary statistics within an RDBMS (MySQL). This has worked OKAY - but, as you can imagine querying this data is a pain and in some scenarios realistically impossible.

We'd like to devise a DB based strategy for storing this data either fully, or, apply some industry or scientifically standardized queryable lossless data compression technique.

This seems to be on par with the problems faced by GIS developers - but, I'm having a difficult time finding literature on their various data structure definitions and/or table definitions and/or whether they have been able to map their data effectively into relational database structures (ideally, we'd love to stick with MySQL).

If you have any insight into how to handle large scientific datasets within this multidimensional coordinate structure please advise. I'd love to see how others are approaching this problem.

Thanks in advance! Received on Mon May 20 2002 - 21:31:23 CEST

Original text of this message