Design question
Date: Mon, 16 Feb 2009 20:34:59 +0100
Message-ID: <4999bf6f$0$2847$ba620e4c_at_news.skynet.be>
Hello group,
Oracle version is 10g. 11g might be possible too.
I'm currently involved with the pre-analysis of a new project I'll be
working on.
We will get historical environmental data (temperature, rainfall, ...) for
about 70.000 different locations, and this for a period of 30 years, with 1
measurement for each day of the year. So the total amount of records will be
around 750 million (70.000 x 365 x 30). This sounds pretty much like a
datawarehouse, no ? ;)
The data model is very simple: 1 table for the locations, 1 table for the
measurements.
Now the difficulty is that the application should be able to query the data
in a very dynamic way. For example: "give me all the locations with an
average temperature of 25 degrees for the month of July during the last 10
years". Or: "give me all the locations where the minimum temperature didn't
drop below 15 degrees during the summer months of the last 5 years".
The data is very static and doesn't change, only once a year new data will
be added.
I already tried some queries on about 120 million records in a
non-partitioned heap table, but performance wasn't that great ... One query
using the AVG function ran about 15 minutes before I terminated it. Parallel
query didn't help much. The server is rather small-ish: RHEL with 2 Intel
dual-core processors, 8 GB of RAM and the Oracle datafiles on 2 scsi disks
in RAID-1.
I have a bit experience with partitioning and pre-aggregation of data
through materialized views, but I find it hard to find a good design. I
could range partition on the measurement date, but queries for multiple
years will hit several (if not all) partitions. Materialized views would be
possible but it would be impossible to cover all the different query
combinations and functions.
Any ideas ? OLAP cubes ? (but: expensive licenses ... ) Put everything in a
index-organized table ?
Matthias Received on Mon Feb 16 2009 - 13:34:59 CST