Re: Design question
Date: Mon, 16 Feb 2009 19:56:37 +0000 (UTC)
On Mon, 16 Feb 2009 20:34:59 +0100, Matthias Hoys wrote:
> Hello group,
> Oracle version is 10g. 11g might be possible too. I'm currently involved
> with the pre-analysis of a new project I'll be working on.
> We will get historical environmental data (temperature, rainfall, ...)
> for about 70.000 different locations, and this for a period of 30 years,
> with 1 measurement for each day of the year. So the total amount of
> records will be around 750 million (70.000 x 365 x 30). This sounds
> pretty much like a datawarehouse, no ? ;)
> The data model is very simple: 1 table for the locations, 1 table for
> the measurements.
> Now the difficulty is that the application should be able to query the
> data in a very dynamic way. For example: "give me all the locations with
> an average temperature of 25 degrees for the month of July during the
> last 10 years". Or: "give me all the locations where the minimum
> temperature didn't drop below 15 degrees during the summer months of the
> last 5 years". The data is very static and doesn't change, only once a
> year new data will be added.
> I already tried some queries on about 120 million records in a
> non-partitioned heap table, but performance wasn't that great ... One
> query using the AVG function ran about 15 minutes before I terminated
> it. Parallel query didn't help much. The server is rather small-ish:
> RHEL with 2 Intel dual-core processors, 8 GB of RAM and the Oracle
> datafiles on 2 scsi disks in RAID-1.
> I have a bit experience with partitioning and pre-aggregation of data
> through materialized views, but I find it hard to find a good design. I
> could range partition on the measurement date, but queries for multiple
> years will hit several (if not all) partitions. Materialized views would
> be possible but it would be impossible to cover all the different query
> combinations and functions.
> Any ideas ? OLAP cubes ? (but: expensive licenses ... ) Put everything
> in a index-organized table ?
Matthias, this looks pretty much like a classic star schema to me. Locations will, of course, be the fact table and measurements will be dimensions. You will need one table for each type of measure.Received on Mon Feb 16 2009 - 13:56:37 CST