Re: How to model disjoint date ranges?

From: Steve Kass <skass_at_drew.edu>
Date: Mon, 22 Jul 2002 19:12:13 -0400
Message-ID: <3D3C914D.48FAA46C_at_drew.edu>


Larry,

  If you want to see more than the tip of the iceberg here, read "Developing Time-Oriented Database Applications in SQL," by Richard T. Snodgrass. It's a tremendous book.

Steve Kass
Drew University

Larry Coon wrote:

> --CELKO-- wrote:
>
> > I prefer to do this with (start_date, end_date) pairs. The logic is
> > that the duration is part of the fact that you are modeling. The pair
> > is really like an (x,y) co-ordinate -- it has two parts in its
> > representation, but it is a single atomic attribute.
>
> I've been discussing this with him over e-mail, and one of
> the points is that this depends on whether the intervals
> allow gaps or overlaps. If gaps or overlaps are allowed,
> then storing start_date & end_date is right. But if there
> are no gaps or overlaps, then storing start_date AND
> end_date means the same fact is being stored multiple times
> in your data. "This" interval ends when the next interval
> begins, and the begin date of the next interval is already
> stored.
>
> > CREATE TABLE Foobar
> > (foo_key INTEGER NOT NULL PRIMAREY KEY,
> > ...
> > start_date TIMESTAMP NOT NULL,
> > end_date TIMESTAMP, --null means still on-going
> > ...);
>
> In our data (university curriculum), you have courses with
> titles, unit values, repeatability, transferability, etc.,
> independently changing over time. So the table to track
> title changes might look like:
>
> CREATE TABLE course_title (
> course_id integer not null,
> title varchar(255),
> start_date datetime,
> end_date datetime,
> foreign key (course_id) references course(course_id),
> primary key (course_id, start_date)
> )
>
> But with this schema, it's easy to generate data anomalies:
>
> INSERT INTO course_title
> VALUES (12345, "Beginning Algebra", "10/1/95", "10/1/01")
>
> INSERT INTO course_title
> VALUES (12345, "Algebra I", "10/1/99", "10/1/03")
>
> But we did find that queries are easier to write when you
> have both the start date and the end date....
>
> > When you do queries, the NULL in the end_date gives you what you need
> > to always have the right facts:
> >
> > To only the things still open:
> >
> > CREATE VIEW CurrentFoobar (..., start_date, end_date, ..)
> > AS SELECT ... start_date, CURRENT_TIMESTAMP, ...
> > FROM Foobar
> > WHERE end_date IS NULL;
> >
> > To see the most recent examples, including those that are closed:
> >
> > CREATE VIEW MostRecentFoobar (..., start_date, end_date, ..)
> > AS SELECT ... start_date, COALESCE(end_date,CURRENT_TIMESTAMP), ...
> > FROM Foobar
> > WHERE end_date IS NULL
> > OR end_date
> > = (SELECT MAX(endate)
> > FROM Foobar AS F1
> > WHERE F1.foo_key = Foobar.foo_key);
> >
> > You get the idea.
>
> We ended up with sentinel dates rather than nulls, so
> comparison operators will continue to work with them, which
> is not true of nulls.
>
> To see the courses that were active on 10/1/01:
>
> select ...
> from course
> where start_date <= '10/1/01'
> and end_date > '10/1/01'
>
>
> Larry Coon
> University of California
> larry_at_assist.org
> and lmcoon_at_home.com
Received on Tue Jul 23 2002 - 01:12:13 CEST

Original text of this message