Re: How to select the <= date row?

From: Bob Berman <rberman_at_boomer.ny.jpmorgan.com>
Date: 1996/06/06
Message-ID: <4p7fem$fff_at_hardcopy.ny.jpmorgan.com>


I believe the problem he is having is not in doing a join across several tables, but in selecting data which may or may not be in a number of detail tables and in select a single date <= an input date.

To solve the problem of joining tables in which a join may or may not exist, I would suggest performing an outer join. Usually you do something like:

	select	main.id, subtbl_a.data1, subtbl_b.data2
	from	main, subtbl_a, subtbl_b
	where	main.id *= subtbl_a.id
	and	main.id *= subtbl_b.id

The larger problem is selecting the ids from the main table which are closest to, but not past, the given date. A simple SQL solution to this is to self join the main table to itself and select the max date where the date is less than the cut-off date, grouped by id, as follows:

	select b.id, max(b2.dt_event)
	from main b, main b2
	where
	b.id = b2.id
	and b2.dt_event <= "desired date"
	group by b.id


It is hard to believe that this would run for "a couple of days", but I don't know what your setup is, so maybe it is possible. But if this is OK, then just expand the query to outer join the detail tables as above.

If this is unacceptable, then I would suggest running a nightly cron job to append to a small two column table which would contain those ids which are "active" today. This would basically be the preceding query. You might at first have to run this over all the dates in your main table, but that is only done once. Then you could perform a simple equijoin from this small table, indexed on date, to the main table, followed by the outer joins to get all your data back.

Finally, can you run stored procedures and just "select" the result set back to your application? This would enable you to have multiple SQL statements, but on the server, not the client.

In article answers_at_ix.netcom.com(Steve Long) writes:
|>There are a couple solutions you could try, depending on your specific
|>environment. First, if you have bench marked your query and
|>empirically know that it will take a couple days to run the SQL you
|>suggested, have you t
|>ried using parallel query to improve the performance? Parallel query
|>can help in any situation.
|>
|>Second, you may have to consider adding another reference table (this
|>is only one of many possibilities) as follows. Since the fastest
|>access path in ORACLE is by rowid, create a single table that stores
|>the policy id, t
|>he date-time stamp of the transaction, and the rowid of each row for
|>each table. In the event you add another detail table, you can add
|>another column to this reference table. Then, when you need to search
|>for a policy
|>for a given date, just hit the reference table with the policy id and
|>the desired date, get the rowid for each table, and hit the other
|>tables with rowid. This should be doable in one SQL statement.
|>Illustration below.
|> As you are aware, it may take some time to build this table in the
|>first place, but that is a fixed cost. You will also need some AFTER
|>INSERT OR UPDATE OR DELETE triggers on each table to maintain the
|>reference table.
|>
|>create table
|> policy_reference_table (
|> policy_id number not null
|> ,dts date not null
|> ,dt1_rid rowid not null -- based on rules for each table
|> ,dt2_rid rowid -- Suggest better names be used
|> ,...
|> )
|>/
|>
|>In the above, dt1_rid is a column containing the rowid of detail table
|>one, dt2_rid is the rowid for detail table 2, etc... I expect you
|>would substitute actual table names or some reasonable abbreviation. I
|>also recomm
|>end foreign key constraints on policy_id back to the master table.
|>Below is the SQL statement. It should be quite fast. You might even
|>consider creating a view for this query.
|>
|>select
|> p.policy_col_1 -- column of interest in your master table
|> ,p.policy_col_2 -- column of interest in your master table
|> ,... -- etc...
|> ,rt1.detail_col_1 -- column of interest in a detail table
|> ,rt2.detail_col_2 -- column of interest in another detail table
|> ,... -- etc...
|>from
|> policy_reference_table prt -- this is the new reference table
|> policy_table p -- this is your existing master table
|> policy_ref_table_1 rt1 -- this is a detail table
|> policy_ref_table_2 rt2 -- this is a detail table
|> ...
|>where
|> prt.policy_id = &policy_id
|>and prt.dts = &given_date
|>and p.policy_id = prt.policy_id
|>and rt1.rowid = prt.dt1_rid
|>and rt2.rowid = prt.dt2_rid
|>and ...
|>
|>
|>Hope this helps.
|>
|>Steve
|>804-262-6332
|>
|>
|>
|>------------------------
|>In <N.060696.151009.56_at_mickey.iafrica.com> stefanbm_at_iafrica.com (Stefan
|>Mahs) writes:
|>>
|>>Hi there, I would appreciate it if you could give me some ideas on
 the
|>>following problem:
|>>
|>>I have a couple of tables related to each other and they store time
|>>variant data. The "main table" contains event info. All the general
|>>details of any event that effects the value of a policy are logged
 here.
|>>Specific info about each event is logged in a set of event-specific
|>>tables. The result of this is that not all the tables get a new row
 for
|>>every event that is logged in the "main table".
|>>
|>>I need to get the complete status of a policy, or set of policies, as
 it
|>>was at any particular point in time. This means supplying at least a
 date
|>>at which the policy's value should be retrieved, say 1 Jan 1995.
 Normaly
|>>one would just join all the tables policy number, but in this case
 there
|>>may not be row for the specified date in one or more of the related
 event
|>>specific tables. In this case I need to get the row for the closesed
|>>previous date. The row with the largest date less than or equal to
 the
|>>specified date, is the one that was still in force at the specified
 date.
|>>
|>>Two things make this a particularly challenging problem:
|>>
|>> 1) This query has to be done from a query tool, so we are restricted
 to a
|>> single SQL statement (unless Business Objects can build a report
 from
|>> more than one SQL statement).
|>>
|>>2) Selecting the max(specified_date) where event_date <=
 specified_date
|>> works fine, but it will take a couple of days to wade through 40
|>> million rows.
|>>
|>>e.g.:
|>>Main_Event_Tbl Event_A_Tbl Event_B_Tbl
|>>-------------- ----------- -----------
|>>95/01/01 95/01/01
|>>95/02/01 95/02/01
|>>95/03/01 95/03/01
|>>
|>
|>
Received on Thu Jun 06 1996 - 00:00:00 CEST

Original text of this message