Temporal Data (long)

From: A little left of plumb <dwyche_at_giant.intranet.com>
Date: 1996/08/07
Message-ID: <1996Aug7.232510.15823_at_giant>


                             Temporal Data

My apologies in advance for the lack of brevity. I really tried hard for clarity instead...

We have an Oracle-based application that warehouses data extracted from our main non-relational OLTP system for both reporting and long-term archiving purposes. Unfortunately all tables need to contain data that changes with time. In order to achieve this we included an extra key-field for each table. That is, a logical table structure that looked like:

    K1, K2, K3, D1, D2, D3, D4, ...

where Kn are key fields and Dn are data fields, will be stored instead in the following structure:

    K1, K2, K3, KT, D1, D2, D3, D4, ...

where the new KT field is a timestamp that indicates the date/time that the information in the row became effective.

The difficulty with this approach is in the extraction of the 'correct' information for reporting purposes. Our users want current information 99% of the time.

Every query must include in its Where-clause a sub-query selecting the maximum value of the KT column that is less than or equal to some target date. For current information the subquery will simply select the maximum value of the KT column for whatever other conditions are true.

The SQL required is awkward. Here's the best I've been able to come up with: (I'm still a relative novice with SQL, so I'd really like to hear about better ideas on these...) FWIW, I've been implementing these test queries on Oracle 7.1 and 7.2 systems.

Take a simple two-key TEST table with three non-key fields. Adding the timestamp key gives it a structure like: TEST(K1, K2, KT, D1, D2, D3)

How do you get the current value, that is the value with the most recent KT field, for a specific K1, K2 combination? ...for just a specific K1? ...for all K1, K2 combinations?

the easy one first... specific K1 and K2, say 'A' and 'B':

    Select K1, K2, D1, D2, D3

      From TEST
     Where K1 = 'A' and
           K2 = 'B' and
           KT = (Select Max(KT)
                   From TEST
                  Where K1 = 'A' and
                        K2 = 'B');

all combinations of K1 and K2:

    Select T1.K1, T1.K2, T1.D1, T1.D2, T1.D3

      From TEST T1, 

(Select K1, K2, Max(KT) MKT
From TEST Group By K1, K2) T2 Where T1.K1 = T2.K1 and T1.K2 = T2.K2 and T1.KT = T2.MKT;

specific K1:

    Select T1.K1, T1.K2, T1.D1, T1.D2, T1.D3

      From TEST T1, 

(Select K2, Max(KT) MKT
From TEST Where K1 = 'A' Group By K2) T2 Where T1.K1 = 'A' and T1.K2 = T2.K2 and T1.KT = T2.MKT;

The trouble with all of the above is that it requires access to a fairly low-level report-writing tool to embed the complex SQL. At best it will require programmers to write virtually all reports. Ad-hoc queries simply cannot be done by any ordinary non-technical user.

The obvious approach to making life easier for the end-user is to encapsulate the SQL in a view. The only reasonable approach is to create a view that can return values for any K1, K2 combination:

  Create or Replace View TEST_VIEW as
    Select T1.K1, T1.K2, T1.D1, T1.D2, T1.D3

      From TEST T1, 

(Select K1, K2, Max(KT) MKT
From TEST Group By K1, K2) T2 Where T1.K1 = T2.K1 and T1.K2 = T2.K2 and T1.KT = T2.MKT;

and then the users can get current values simply and naturally by selecting from the view:

    Select *

      From TEST_VIEW
     Where K1 = 'A';

The problem with this is that the view's inner parenthesized select statement doesn't see the K1 = 'A' part of the where clause and therefore does a full table scan, at least according to Oracle's Explain Plan. I have been unable to come up with a SQL-based view of the TEST table that could be selected against as in the above statement that does not involve a full table scan.

So then I thought about writing a PL/SQL function to do the work, and calling the function from a view: (This example uses artificial 8-character YYYYMMDD timestamps for simplicity...)

Create or Replace Function
  GET_MAX_KT (K1_IN In VarChar2,

              K2_IN In VarChar2)
  Return VarChar2
  is
    KT_OUT VarChar2(8);
  Begin
    Select Max(KT)

      Into KT_OUT
      From TEST
     Where K1 = K1_IN and
           K2 = K2_IN;

    Return KT_OUT;
  End;
/
Create or Replace View TEST_VIEW as
  Select K1, K2, D1, D2, D3
    From TEST
   Where KT = GET_MAX_KT (K1, K2);

Now the 'routine' selection from the view where K1 is known no longer does the full table scan. Unfortunately though, performance so far has been very poor. Seems like the function call adds massive overhead to the data retrieval, and I'm not sure if anything can be done about it. (I'm a PL/SQL novice as well...)

Am I missing something obvious? Is there a better way to write the function, or either flavor of view? Is there a better way to approach the whole problem?

Any ideas, thoughts, pointers, etc would be greatly appreciated.

TIA,                david
Received on Wed Aug 07 1996 - 00:00:00 CEST

Original text of this message