Temporal Data (long)
Date: 1996/08/07
Message-ID: <1996Aug7.232510.15823_at_giant>
Temporal Data
My apologies in advance for the lack of brevity. I really tried hard for clarity instead...
We have an Oracle-based application that warehouses data extracted from our main non-relational OLTP system for both reporting and long-term archiving purposes. Unfortunately all tables need to contain data that changes with time. In order to achieve this we included an extra key-field for each table. That is, a logical table structure that looked like:
K1, K2, K3, D1, D2, D3, D4, ...
where Kn are key fields and Dn are data fields, will be stored instead in the following structure:
Every query must include in its Where-clause a sub-query selecting the maximum value of the KT column that is less than or equal to some target date. For current information the subquery will simply select the maximum value of the KT column for whatever other conditions are true.
The SQL required is awkward. Here's the best I've been able to come up with: (I'm still a relative novice with SQL, so I'd really like to hear about better ideas on these...) FWIW, I've been implementing these test queries on Oracle 7.1 and 7.2 systems.
Take a simple two-key TEST table with three non-key fields. Adding the timestamp key gives it a structure like: TEST(K1, K2, KT, D1, D2, D3)
How do you get the current value, that is the value with the most recent KT field, for a specific K1, K2 combination? ...for just a specific K1? ...for all K1, K2 combinations?
the easy one first... specific K1 and K2, say 'A' and 'B':
Select K1, K2, D1, D2, D3
From TEST Where K1 = 'A' and K2 = 'B' and KT = (Select Max(KT) From TEST Where K1 = 'A' and K2 = 'B');
all combinations of K1 and K2:
Select T1.K1, T1.K2, T1.D1, T1.D2, T1.D3
From TEST T1,
(Select K1, K2, Max(KT) MKT
From TEST Group By K1, K2) T2 Where T1.K1 = T2.K1 and T1.K2 = T2.K2 and T1.KT = T2.MKT;
specific K1:
Select T1.K1, T1.K2, T1.D1, T1.D2, T1.D3
From TEST T1,
(Select K2, Max(KT) MKT
From TEST Where K1 = 'A' Group By K2) T2 Where T1.K1 = 'A' and T1.K2 = T2.K2 and T1.KT = T2.MKT;
The trouble with all of the above is that it requires access to a fairly low-level report-writing tool to embed the complex SQL. At best it will require programmers to write virtually all reports. Ad-hoc queries simply cannot be done by any ordinary non-technical user.
The problem with this is that the view's inner parenthesized select statement doesn't see the K1 = 'A' part of the where clause and therefore does a full table scan, at least according to Oracle's Explain Plan. I have been unable to come up with a SQL-based view of the TEST table that could be selected against as in the above statement that does not involve a full table scan.
So then I thought about writing a PL/SQL function to do the work, and calling the function from a view: (This example uses artificial 8-character YYYYMMDD timestamps for simplicity...)
Create or Replace Function
GET_MAX_KT (K1_IN In VarChar2,
K2_IN In VarChar2)
Return VarChar2
is
KT_OUT VarChar2(8);
Begin
Select Max(KT)
Into KT_OUT From TEST Where K1 = K1_IN and K2 = K2_IN;
Return KT_OUT;
End;
/
Create or Replace View TEST_VIEW as
Select K1, K2, D1, D2, D3
From TEST
Where KT = GET_MAX_KT (K1, K2);
Now the 'routine' selection from the view where K1 is known no longer does the full table scan. Unfortunately though, performance so far has been very poor. Seems like the function call adds massive overhead to the data retrieval, and I'm not sure if anything can be done about it. (I'm a PL/SQL novice as well...)
Am I missing something obvious? Is there a better way to write the function, or either flavor of view? Is there a better way to approach the whole problem?
Any ideas, thoughts, pointers, etc would be greatly appreciated.
TIA, davidReceived on Wed Aug 07 1996 - 00:00:00 CEST