Re: O'Reilly interview with Date

From: Paul <paul_at_test.com>
Date: Fri, 12 Aug 2005 15:46:05 +0100
Message-ID: <42fcb62d$0$1288$ed2619ec_at_ptn-nntp-reader02.plus.net>


Kenneth Downs wrote:

>>OK but if you have to specify the foreign key, why not just specify the
>>columns in the firstplace?

>
> Ah, this question gets to the heart of things.
>
> First is the dogmatic answer. The dogmatic answer is that well-designed
> tables will always join by keys. The key (or keys) defines the only
> meaningful linking of the two tables in normal operations. Therefore you
> specify the key because that is how they join, it is actually unnatural and
> downright incorrect to specify columns.

Is that definitely true? I agree that there should be (or could be) a constraint for every join you might want to do, but can this be narrowed down further to a foreign key constraint?

>>What if you have a non-standard join? For example start_date and
>>end_date columns and you want to join where some date is between these
>>two?

>
> This is not actually non-standard. It only appears so because there is no
> real support for intervals and so we all code intervals manually. But if
> start_date and end_date columns were defined as an interval, then comparing
> the interval to a single date implies "between", just as a unique
> constraint on the interval implies "does not overlap", and the interval
> definition itself generates the constraint end_date >= start_date.

So you're saying there is a kind of lop-sided "equality" defined with one operand being a date and the other an interval? And date = interval iff the date is contained within the interval?

Is it OK to define equality in a non-symmetrical and non-transitive way like this, with different types?

I agree a lot of non-standard joins are due to badly-designed schemas, but I can think of examples where you might want to define a join with a "greater than" clause - for example when joining a table to itself to get some kind of ranking number. (How many other salespersons have total sales greater than each salesperson?) I can't see how things can be designed to get around this.

Here's another possible scenario for the date range example: I might want to make one join based on the start_date only, and a different join based on the start/end interval. Maybe things starting on certain days get a price reduction or something. Should this be modelled:

  1. as two columns: start_date and end_date? Then you have the problem of needing "between" joins as above.
  2. a single interval column? Then you have a problem joining to the start date of the interval. The interval is atomic, but you could have a start() function that picks out the start of the interval. Even so, this ceases to be a standard join.
  3. two columns: start_date and interval? With a constraint that start(interval) = start_date? I guess this solves the problem but the redundancy seems a little wrong to me.

Then if you wanted also a join on the end_date you'd need three columns: start_date, end_date and interval.

Paul. Received on Fri Aug 12 2005 - 16:46:05 CEST

Original text of this message