Re: Logical and physical structure of time series (in ORACLE)

From: Nigel McFarlane <nrm_at_tusc.com.au>
Date: 1997/02/11
Message-ID: <32FFC371.60DB@tusc.com.au>#1/1

Peter Wieland wrote:
>
> I try to find a good solution for a logical and physical database
> design for the following problem in ORACLE:
>
> I get every minute a huge amount of pairs like (s;v) into my database,
> with
>
> s: Signal-ID (integer)
> v: value for this signal (float)
>
> that means a time series arrives at my database:
>
> t1: (s1;v1) (s2;v2) (s3;v3) ... (s4000;v4000)
> t2: (s1;v1) (s2;v2) (s3;v3) ... (s4000;v4000)
> etc..
>

Peter, I have exactly the same problem at the moment.

	The only difference may be in the pattern of the data values.
	In my case, most of the data is 0 with occasional periods
	of non-zero values.

	The solution which we're currently using, which I don't
	particularly like is similar to yours - we use an algorithm
	called 'run length encoding' which compresses many rows of
	identical data into one row.  Our table looks like this:

	Time	signal id		value	rows_encoded.

	The rows_encoded field says how many rows with the same value
	have occurred next to each other. We get high compression with this.

	I've been speculating on better solutions.  One possibility
	is to use an event model.  In my case, 95% or more
	of the data is an uninteresting value, so why store it.  Instead
	we could store only those values which are significant.  That
	would drop our data requirements immensely.  All we need then is for
	report-generation programs/graphing programs to be smart enough
	to realise that "the absence of data implies normal data".

	One thing we have done is to use some OLAP-style tables.  It turns
	out that after a few months, the need for the individual point
	measurements reduces, so we keep summary tables for daily and
	monthly totals; we just have a batch job updating every night.

	A third possibility which I haven't investigated in detail
	relies on the assumption that there is no interactive SQL access
	to the tables.  In this case we have further options:

	1) As above, if I use an event model, I can put some logic
	into all programs which read it out, so that the original
	data is reconstructed 'on the fly'.

	2) I can keep an empty temporary table and if I want to examine
	a timespan in detail, I can expand the original stored data for that
	period into rows in the temporary table and work on that in a
	more convenient format.

	A fourth possibility is that you are an engineer/comms programmer
	and aren't aware of all the performance tricks available when setting
	up a database.  You won't be able to reduce the disk space requirements
	much but you should be able to get quite good performance, provided you
	stick to the kind of SELECT statements you gave examples of.  If you
	hadn't though of this, I advise you to look at all the tricks in the
	oracle performance and tuning guide, and especially CREATE TABLE.

	Finally, you could use some maths.  If you have a table like this:

	day	signal	t0	t1	t2	t3	t4 ....

	you could collect a day's (or an hour's) data in your original table
	and then find a polynomial expression to match the point values like
	this:

	a + b*t + c*t^2 + d*t^3 + e*t^4 + ...

	where a, b, c, d, e are numbers, t is time.

	Then you store the polynomial in the table above.  This reduces your
	data requirements a lot AND provided you have enough coefficients,
allows
	you to reconstruct your data EXACTLY without loss.  Its also useful
	if your main interest is in graphing the data.

	Anyway, I hope you have further thoughts on the matter.

	- Nigel.

-- 
------------------------------------------------------------------------------
Nigel McFarlane.  TUSC Computer Systems 		       nrm_at_tusc.com.au
		  666 Doncaster Rd, Doncaster 3108	   BH: +61.3.9840.4476
		  Melbourne, Victoria, Australia.	  GMT +10 or +11 hours

   When was the last time you were that frustrated _away_ from a
computer?

Received on Tue Feb 11 1997 - 00:00:00 CST