Re: filling in missing dates in a time series

From: paul c <toledobythesea_at_oohay.ac>
Date: Wed, 15 Feb 2006 20:17:57 GMT
Message-ID: <V%LIf.13285$sa3.9379_at_pd7tw1no>


Bob Badour wrote:

> paul c wrote:
> 

>> carol_marra_at_msn.com wrote:
>>
>>> The bulk of our Oracle database data is time series data, at various
>>> intervals (hourly,
>>> daily, monthly, etc). When a value for a particular site is unavailable
>>> at a given
>>> timestep (for instance, if a sensor is down on March 1, 2005) we store
>>> *nothing*, rather than creating a record with a null value. Also note
>>> that we're not using any time-series management extensions to oracle.
>>> Timestamp on a value is stored as 2 oracle DATE fields,
>>> a start date and an end date, to indicate the entire interval to which
>>> the value applies.
>>>
>>> But, there are some instances where we want to display a complete,
>>> uninterrupted time series.... in other words, display those dates when
>>> there is no value actually stored in the database.
>>>
>>> To date, we have done this by outer joining with a table that holds all
>>> dates for the time step in question (hourly, daily, monthly, etc). It
>>> works, but it's not very elegant or practical.
>>>
>>> Other thoughts on how to achieve this? (Apologies if there are
>>> objections to this being posted in the theories group; it seems a
>>> reasonable place to me.)
>>>
>>> Thanks,
>>> carol
>>>
>>
>>
>> Can't comment on products. If you're talking about the recording of
>> negatives in RM theory, the wiki page at
>> http://en.wikipedia.org/wiki/Relational_algebra has lately undergone
>> some enhancement by several of the RM heavyweights.
>>
>> One paragraph that I found enlightening is:
>>
>> (quote):
>>
>> It is important to realise that Codd's algebra is not in fact complete
>> with respect to first-order logic. Had it been so, certain
>> insurmountable computational difficulties would have arisen for any
>> implementation of it. To overcome these difficulties, he restricted
>> the operands to finite relations only and also proposed restricted
>> support for negation (NOT) and disjunction (OR). Analogous
>> restrictions are found in many other logic-based computer languages.
>> Codd defined the term relational completeness to refer to a language
>> that is complete with respect to first-order predicate calculus apart
>> from the restrictions he proposed. In practice the restrictions have
>> no adverse effect on the applicability of his relational algebra for
>> database purposes.
>>
>> (end of quote).
>>
>> Personally, I'm not sure about "no adverse effect". As I understand
>> it, Codd was talking about recording facts through what are called
>> their extensions, basically the enumerations of sets, possibly very
>> lengthy enumerations as you seem to be referring to. The
>> logicians/mathematicians have a counterpart (I believe) that they call
>> an intension which is a description of a set that is often much more
>> concise in its expression than is an extension. It seems that the
>> general understanding of the place of intensions is in programs as
>> opposed to databases. If I'm right that that's the orthodox way to
>> view things, then the theory would agree with your opinion that the
>> 'all-dates' table is a clumsy solution.
>>
>>
>> Wistfully, I find this a bit unfortunate, but maybe I just don't know
>> enough to see that an RM with intensions is a bigger problem, eg., a
>> 'table' defined via intension. The RM people seem to accept or even
>> popularize the idea of a constraint on extensions (apart from
>> 'integrity' reasons, this can 'limit' tables to some practical size).
>> If extensions can have constraints, why not allow intensions since
>> they are effectively constraints on a domain?
>>
>>
>> Some of the 'OO database' activity seems to have been perhaps
>> indirectly motivated by how to approach negatives with high-level
>> support for defining domains, although there doesn't seem to be much
>> solid theory behind it, just ease-of-programming arguments which are
>> often disputed. The big mistake of their proponents seems to be that
>> in trying this they also discarded the RM!
>>
>>
>> Likely this is an unsatisfactory answer to anybody who wants a more
>> 'elegant' solution right now.
>>
>> pc
> 
> 
> Paul,
> 
> I suspect you misunderstand the restrictions Codd proposed. 

Could well be, that's why I'm here (and there). He didn't write a lot about the restrictions, much of what I know is hear-say. After many years, I'm still trying to understand a talk he gave about Christie Brinkley's telephone number!

> He did not 
> restrict a dbms from having a virtual relation with every discrete time 
> within some time representation or a virtual relation with all 4 billion 
> or so values representable by a 32-bit integer etc.

I know the quote mentioned infinity, but I try to stay away from the term, preferring words like 'lengthy'. I would like to see a real product that supported such virtual relations as I find definitions easier to understand than procedural code - I certainly wasn't advocating such code, just saying that I think it is current orthodoxy.

> 
> He only restricted infinite sets. Due to the nature of infinite sets, 
> one cannot represent all of the members of the set in a finite space. 
> Essentially, once the number reaches a certain size, the number of bits 
> required to represent it becomes too big for whatever store you have.
> 
> First order logic allows one to describe infinite sets. For instance, 
> the set of integers is infinite. The set of integers not equal to 1 is 
> likewise infinite.
> 
> To avoid those things, Codd restricted against infinite sets and limited 
> negation to difference.
> 
> Thus you cannot specify the infinite set of integers or the infinite set 
> of integers not equal to 1.
> 
> You can specify the finite set of integers representable in 32-bits or 
> 64-bits or even 128-bits, 


> and you can specify the finite set of such > integers that are not equal to 1.

That's one of the things I want in a product. Admittedly my orientation is towards what a machine can do without disrespecting theory. To change the example a bit, I've personally never felt that it was the machine's responsibility to catch situations like overflow or division by zero, feeling that this is where humans have their natural place. I've come to realize that this is controversial because I've often been compared to a 'theoretical philistine' for saying it.

> 
> Of course, even though the model allows it, one will likely have to wait 
> too long for any query that evaluates every 128-bit integer not equal to 1.


Yes, have been doing some timing tests lately and wish I had been born thirty years later. Saw the interview with Eckart on computerworld today and he quoted a speed for some arithmetic operation of .00002 seconds on the 1946 Eniac, which in rough terms was about the speed of the S360/30, twenty years after the Eniac. Now, on this little Pentium Mobile I have here, which the hardware people consider medium speed on the desktop I am getting results that are roughly hundreds of times faster, sometimes thousands. Doesn't take very long now to something a billion times - I have to run my tests for many many seconds because there aren't enough timer pops to get accurate measurements on the speed of this thing. Plus, I could fit many of the multi-user apps I used to work on in memory. I believe this undercuts the traditional starting point for much of the concurrency theory of the last thirty years, not to mention many of the physical comprises that today surround Codd's basic relational operators.

I guess my point in this reply is that the practical limits have changed dramatically since Codd's first papers - the practical ceilings are much higher now.

pc Received on Wed Feb 15 2006 - 21:17:57 CET

Original text of this message