problem with historical data and referential integrity

From: plm <plmuon_at_gmail.com>
Date: 2 Dec 2004 02:14:55 -0800
Message-ID: <1101982495.573333.229540_at_z14g2000cwz.googlegroups.com>



Hello, I have a question on how to solve a problem with historical data and RI.

Here is the situation when only actual data are involved:

CREATE TABLE itemgroup
(group_id INTEGER, name STRING,

CONSTRAINT group_pk PRIMARY KEY (group_id) );

CREATE TABLE item
(item_id INTEGER, name STRING,

CONSTRAINT item_pk PRIMARY KEY (item_id) );

CREATE TABLE group_part
(group_id INTEGER, item_id INTEGER,

CONSTRAINT group_part_pk PRIMARY KEY (group_id,item_id), CONSTRAINT group_part_fk1 FOREIGN KEY (group_id) REFERENCES itemgroup(group_id),
CONSTRAINT group_part_fk2 FOREIGN KEY (item_id) REFERENCES item(item_id)
);

Now we have to add historical data: the changing contents of an itemgroup has to be historized and kept in the database. The name may change, but more important the contents of the group may change over time. Each state must be preserved.

If we add "valid_from" coumns to item and itemgroup and include these in their primary keys, I see a problem with group_part: an update in itemgroup alone (e.g. the name attribute) would lead to a new valid_from, thus forcing a new record in group_part as well (which has a foreign key on itemgroup).

For changing the contents of an item_group, we also need a "valid_from" column in group_part as well: it would look like this:

CREATE TABLE group_part
(group_id INTEGER, item_id INTEGER, valid_from DATE,
valid_from_group DATE,
valid_from_item DATE,
CONSTRAINT group_part_pk PRIMARY KEY (group_id,item_id,valid_from), CONSTRAINT group_part_fk1 FOREIGN KEY (group_id,valid_from_group) REFERENCES itemgroup(group_id,valid_from), CONSTRAINT group_part_fk2 FOREIGN KEY (item_id,valid_from_item) REFERENCES item(item_id,valid_from)
);

This gets quite complicated, especially if more relationships exist I doubt if it is possible to keep RI in all cases.

As another possible solution I see a separation between current data
(with RI) and historical data without (similar to a datawarehouse).
Eeach situation which has been historized has been "current", so its consistency is guaranteed indirectly. All historized data is put in separate tables without relation to the current data.

Someone has suggested to separate all changing records out into separate tables, e.g. the itemgroup table would only have the group_id attribute, and the name attribute would go in a "group_id_hist) table containing group_id,valid_from,name. However this does not really solve the many-to-many relationship problem between item and itemgroup. Also it gives a fake idea of integrity, since any item that ever existed would remain in the item table (also consisting only of item_id), so when putting an item in a group one cannot be sure if the item still exists logically. In the situation with only current data the item records would be deleted and afterwards can no longer be referenced in group_part.

Is there a standard/recommended way for handling this? TIA, Peter Mutsaers Received on Thu Dec 02 2004 - 11:14:55 CET

Original text of this message