Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Logical and physical structure of time series (in ORACLE)
Peter Wieland wrote:
>
> I try to find a good solution for a logical and physical database
> design for the following problem in ORACLE:
>
> I get every minute a huge amount of pairs like (s;v) into my database,
> with
>
> s: Signal-ID (integer)
> v: value for this signal (float)
>
> that means a time series arrives at my database:
>
> t1: (s1;v1) (s2;v2) (s3;v3) ... (s4000;v4000)
> t2: (s1;v1) (s2;v2) (s3;v3) ... (s4000;v4000)
> etc..
>
Peter, I have exactly the same problem at the moment.
The only difference may be in the pattern of the data values. In my case, most of the data is 0 with occasional periods of non-zero values. The solution which we're currently using, which I don't particularly like is similar to yours - we use an algorithm called 'run length encoding' which compresses many rows of identical data into one row. Our table looks like this: Time signal id value rows_encoded. The rows_encoded field says how many rows with the same value have occurred next to each other. We get high compression with this. I've been speculating on better solutions. One possibility is to use an event model. In my case, 95% or more of the data is an uninteresting value, so why store it. Instead we could store only those values which are significant. That would drop our data requirements immensely. All we need then is for report-generation programs/graphing programs to be smart enough to realise that "the absence of data implies normal data". One thing we have done is to use some OLAP-style tables. It turns out that after a few months, the need for the individual point measurements reduces, so we keep summary tables for daily and monthly totals; we just have a batch job updating every night. A third possibility which I haven't investigated in detail relies on the assumption that there is no interactive SQL access to the tables. In this case we have further options: 1) As above, if I use an event model, I can put some logic into all programs which read it out, so that the original data is reconstructed 'on the fly'. 2) I can keep an empty temporary table and if I want to examine a timespan in detail, I can expand the original stored data for that period into rows in the temporary table and work on that in a more convenient format. A fourth possibility is that you are an engineer/comms programmer and aren't aware of all the performance tricks available when setting up a database. You won't be able to reduce the disk space requirements much but you should be able to get quite good performance, provided you stick to the kind of SELECT statements you gave examples of. If you hadn't though of this, I advise you to look at all the tricks in the oracle performance and tuning guide, and especially CREATE TABLE. Finally, you could use some maths. If you have a table like this: day signal t0 t1 t2 t3 t4 .... you could collect a day's (or an hour's) data in your original table and then find a polynomial expression to match the point values like this: a + b*t + c*t^2 + d*t^3 + e*t^4 + ... where a, b, c, d, e are numbers, t is time. Then you store the polynomial in the table above. This reduces your data requirements a lot AND provided you have enough coefficients, allows you to reconstruct your data EXACTLY without loss. Its also useful if your main interest is in graphing the data. Anyway, I hope you have further thoughts on the matter. - Nigel.
-- ------------------------------------------------------------------------------ Nigel McFarlane. TUSC Computer Systems nrm_at_tusc.com.au 666 Doncaster Rd, Doncaster 3108 BH: +61.3.9840.4476 Melbourne, Victoria, Australia. GMT +10 or +11 hours When was the last time you were that frustrated _away_ from a computer?Received on Tue Feb 11 1997 - 00:00:00 CST