Re: How to model disjoint date ranges?

From: Larry Coon <larry_at_assist.org>
Date: Mon, 22 Jul 2002 09:19:56 -0700
Message-ID: <3D3C30AC.3268_at_assist.org>


--CELKO-- wrote:

> I prefer to do this with (start_date, end_date) pairs. The logic is
> that the duration is part of the fact that you are modeling. The pair
> is really like an (x,y) co-ordinate -- it has two parts in its
> representation, but it is a single atomic attribute.

I've been discussing this with him over e-mail, and one of the points is that this depends on whether the intervals allow gaps or overlaps. If gaps or overlaps are allowed, then storing start_date & end_date is right. But if there are no gaps or overlaps, then storing start_date AND end_date means the same fact is being stored multiple times in your data. "This" interval ends when the next interval begins, and the begin date of the next interval is already stored.

> CREATE TABLE Foobar
> (foo_key INTEGER NOT NULL PRIMAREY KEY,
> ...
> start_date TIMESTAMP NOT NULL,
> end_date TIMESTAMP, --null means still on-going
> ...);

In our data (university curriculum), you have courses with titles, unit values, repeatability, transferability, etc., independently changing over time. So the table to track title changes might look like:

CREATE TABLE course_title (
  course_id integer not null,
  title varchar(255),
  start_date datetime,
  end_date datetime,
  foreign key (course_id) references course(course_id),   primary key (course_id, start_date)
)

But with this schema, it's easy to generate data anomalies:

INSERT INTO course_title
VALUES (12345, "Beginning Algebra", "10/1/95", "10/1/01")

INSERT INTO course_title
VALUES (12345, "Algebra I", "10/1/99", "10/1/03")

But we did find that queries are easier to write when you have both the start date and the end date....

> When you do queries, the NULL in the end_date gives you what you need
> to always have the right facts:
>
> To only the things still open:
>
> CREATE VIEW CurrentFoobar (..., start_date, end_date, ..)
> AS SELECT ... start_date, CURRENT_TIMESTAMP, ...
> FROM Foobar
> WHERE end_date IS NULL;
>
> To see the most recent examples, including those that are closed:
>
> CREATE VIEW MostRecentFoobar (..., start_date, end_date, ..)
> AS SELECT ... start_date, COALESCE(end_date,CURRENT_TIMESTAMP), ...
> FROM Foobar
> WHERE end_date IS NULL
> OR end_date
> = (SELECT MAX(endate)
> FROM Foobar AS F1
> WHERE F1.foo_key = Foobar.foo_key);
>
> You get the idea.

We ended up with sentinel dates rather than nulls, so comparison operators will continue to work with them, which is not true of nulls.

To see the courses that were active on 10/1/01:

select ...
from course
where start_date <= '10/1/01'
and end_date > '10/1/01'  

Larry Coon
University of California
larry_at_assist.org
and lmcoon_at_home.com Received on Mon Jul 22 2002 - 18:19:56 CEST

Original text of this message