Re: Theory of Timeseries extensions to SQL and database

From: David <david.wynter_at_btclick.com>
Date: Sat, 26 Oct 2002 10:27:24 +0000 (UTC)
Message-ID: <apdqmc$j22$1_at_knossos.btinternet.com>


Hi,

What I am attempting to do is understand how databases like kdb, FAME and IBM's IDS with Timeseries DataBlade work. These all achieve perfomance at least an order of magnitude better than traditional relational databases for specific functional analysis queries on timeseries such as market TICs or historic day end pricing for financial instruments.

Most of the research I have comes across deals with the theory of the query language required to deal with intervals or points. I am curious to find out if any research has been done on the physical implementation required to deal with this volume of timeseries data. I know that kdb use the basic concept of storing columns on disk rather than relations and are moving this column data into memory to operate on.

Once I have researched the techniques that people have explored then I might have a go at extending one of the open source database (like Hypersonic) to add the functions used for the analysis (moving averages, autocorrelation, double exponential smoothing, autoregressive equations etc.).

I designed and built a relational database from first principals 10 years ago (for the PenPoint operating system). Just getting curious again ;)

I have the potential to get access to close to 20 years of closing price history for about 500,000 instruments, thus the matrix dimension given.

David

Jan Hidders" <hidders_at_REMOVE.THIS.uia.ua.ac.be> wrote in message news:3db991b7$1_at_news.uia.ac.be...
> David wrote:
> >
> >I did a Google search on timeseries and database and found some research
> >called Sorted Relational Query Language. Also SQL-TS. I have a few
questions
> >beyond what I found.
> >
> >Does anyone know where there is reseach on the method of storing columns
> >instead of relations, called inversion I believe?
>
> Inverted files perhaps? That can be a very bad idea depending on what you
> are planning to do with your data.
>
> >What I am after is an understanding of the techniques available to
> >manipulate (i.e. query against) result sets as big as a 500,000*5,000
> >matrix of a 2 element aray containing a unique identifier and a numeric
> >value that would perform very fast.
> >
> >Are there any are there any implementations of SRQL or SQL-TS?
>
> For starters:
> - How sparse is your matrix?
> - What are the typical queries that you want to ask?
> - What happened when you tried to do it in a conventional RDMBS (with a 4
> column relation)? Did you have the proper indexes?
>
> -- Jan Hidders
>
Received on Sat Oct 26 2002 - 12:27:24 CEST

Original text of this message