Re: Temporal Data (long)

From: Russell Searle <rsearle_at_oak.trl.oz.au>
Date: 1996/08/12
Message-ID: <pzk9v5b57f.fsf_at_oak.melb.cpr.itg.telecom.com.au>#1/1


                                Temporal Data

   My apologies in advance for the lack of brevity. I really tried hard for    clarity instead...

   We have an Oracle-based application that warehouses data extracted from our    main non-relational OLTP system for both reporting and long-term archiving    purposes. Unfortunately all tables need to contain data that changes with    time. In order to achieve this we included an extra key-field for each    table. That is, a logical table structure that looked like:

       K1, K2, K3, D1, D2, D3, D4, ...

   where Kn are key fields and Dn are data fields, will be stored instead    in the following structure:

       K1, K2, K3, KT, D1, D2, D3, D4, ...

   where the new KT field is a timestamp that indicates the date/time that    the information in the row became effective.

   The difficulty with this approach is in the extraction of the 'correct'    information for reporting purposes. Our users want current information 99%    of the time.

   Every query must include in its Where-clause a sub-query selecting the    maximum value of the KT column that is less than or equal to some target    date. For current information the subquery will simply select the maximum    value of the KT column for whatever other conditions are true.

   The SQL required is awkward. Here's the best I've been able to come up    with: (I'm still a relative novice with SQL, so I'd really like to hear    about better ideas on these...) FWIW, I've been implementing these test    queries on Oracle 7.1 and 7.2 systems.

   Take a simple two-key TEST table with three non-key fields. Adding the    timestamp key gives it a structure like: TEST(K1, K2, KT, D1, D2, D3)

   How do you get the current value, that is the value with the most recent KT    field, for a specific K1, K2 combination? ...for just a specific K1?    ...for all K1, K2 combinations?

<lots of stuff snipped>

   Am I missing something obvious? Is there a better way to write the    function, or either flavor of view? Is there a better way to approach the    whole problem?

   Any ideas, thoughts, pointers, etc would be greatly appreciated.

   TIA,                david

----------------------------------------

David,

One solution might be to introduce controlled redundancy. (yes data duplication). This involves a separate table that contains only the latest data for each key. The existing table def'n stands as it is the correct view of the world. But the new table would look like:

   K1, K2, K3, DT, D1, D2, D3, D4, ...

The only difference is that 'KT' is replaced with 'DT'. ie. the primary key reduces from a four field to a three field key. This new table is the summary table that contains current information.

I rely on the fact that this data is extracted from another production system. Therefore all of the updates are under program control and users do not form part of the equation, hence the term: 'controlled' redundancy.

The extraction process can either:
1) update the summary table data fields for each key combination during the update process as an extension to each transaction, or 2) rebuild the summary table once the extraction is complete.

Either way the updates are controlled. If the majority of the processing of this data is reporting as opposed to updating, this approach stands to offer significant performance improvementsthat

Now the caveat,

Yes I know that this is a denormalised model but I am trying to take a more pragmatic approach based on the assumption that the updating of the data is tightly controlled ('controlled' redundancy). It is not 'UN'normalised. Meaning that no business value is lost as the summary data can be rebuilt at any time.

Russell Received on Mon Aug 12 1996 - 00:00:00 CEST

Original text of this message