Re: Design question

From: Mladen Gogala <gogala.mladen_at_gmail.com>
Date: Mon, 16 Feb 2009 19:56:37 +0000 (UTC)
Message-ID: <gncgdk$g7o$1_at_solani.org>



On Mon, 16 Feb 2009 20:34:59 +0100, Matthias Hoys wrote:

> Hello group,
>
> Oracle version is 10g. 11g might be possible too. I'm currently involved
> with the pre-analysis of a new project I'll be working on.
> We will get historical environmental data (temperature, rainfall, ...)
> for about 70.000 different locations, and this for a period of 30 years,
> with 1 measurement for each day of the year. So the total amount of
> records will be around 750 million (70.000 x 365 x 30). This sounds
> pretty much like a datawarehouse, no ? ;)
> The data model is very simple: 1 table for the locations, 1 table for
> the measurements.
> Now the difficulty is that the application should be able to query the
> data in a very dynamic way. For example: "give me all the locations with
> an average temperature of 25 degrees for the month of July during the
> last 10 years". Or: "give me all the locations where the minimum
> temperature didn't drop below 15 degrees during the summer months of the
> last 5 years". The data is very static and doesn't change, only once a
> year new data will be added.
> I already tried some queries on about 120 million records in a
> non-partitioned heap table, but performance wasn't that great ... One
> query using the AVG function ran about 15 minutes before I terminated
> it. Parallel query didn't help much. The server is rather small-ish:
> RHEL with 2 Intel dual-core processors, 8 GB of RAM and the Oracle
> datafiles on 2 scsi disks in RAID-1.
> I have a bit experience with partitioning and pre-aggregation of data
> through materialized views, but I find it hard to find a good design. I
> could range partition on the measurement date, but queries for multiple
> years will hit several (if not all) partitions. Materialized views would
> be possible but it would be impossible to cover all the different query
> combinations and functions.
> Any ideas ? OLAP cubes ? (but: expensive licenses ... ) Put everything
> in a index-organized table ?
>
> Matthias

Matthias, this looks pretty much like a classic star schema to me. Locations will, of course, be the fact table and measurements will be dimensions. You will need one table for each type of measure.

-- 
http://mgogala.freehostia.com
Received on Mon Feb 16 2009 - 13:56:37 CST

Original text of this message