Re: does a table always need a PK?

From: Paul G. Brown <paul_geoffrey_brown_at_yahoo.com>
Date: 2 Sep 2003 12:40:47 -0700
Message-ID: <57da7b56.0309021140.e506e98_at_posting.google.com>

"Bob Badour" <bbadour_at_golden.net> wrote in message news:<yw15b.367$Py1.39244951_at_mantis.golden.net>...
> "Lauri Pietarinen" <lauri.pietarinen_at_atbusiness.com> wrote in message
> news:3F5442BC.8000500_at_atbusiness.com...
> > Paul G. Brown wrote:
> >
> > >Lauri Pietarinen <lauri.pietarinen_at_atbusiness.com> wrote in message
> news:<bj0da9$ue8$1_at_nyytiset.pp.htv.fi>...
> > >
> > >
> > [snip]
> >
> > > Trouble is, you can't build systems for the most of the time. If
> you're
> > > going to implement something it has to cater to the nasty corner cases:
> the
> > > ones where it isn't clear where to add the duplicate discard operations
> but
> > > where it is clear you're going to need a lot of 'em. (Queries sans
> keys).
> > >
> > >
> > You could also argue that current SQL-implementations miss lot's of
> > corner cases as the now stand. So you would
> > perhaps be trading some corner cases for other ones, i.e. we would gain
> > some things and perhaps lose
> > others...

SQL-implementations get the corner cases right with respect to the SQL definition. If they don't, it's a bug. And even this is a far from simple software engineering problem.

The SQL standard (I'm being politic here) doesn't make a virtue of its fidelity to the Relational Model. And in doing so, it certainly makes some 'corner cases' (for both database and DBMS developers) harder to implement than they would otherwise be.

Consequently SQL implementations makes it hard to model lot of corner cases that the Relational Model elegantly deals with.

>
> Queries without keys? All relations have at least one key.

Anyone else wanna further Bob's education? No? OK, I'll take it up.

Bob, for someone who throws words like 'ignorami' around a lot, you sure have a bit to learn. You could start by reading something by practically any writer on the Relational Model to the effect that keys are not associations between tables: they are integrity rules, and are only incidentally involved in the workload. Yes, even in a strongly relational schema you can have queries, even join queries, where keys aren't involved. And if you could think independently you would realize that keys as classically defined in Relational Theory are awkward when used to cover a range of real-world integrity problems. (Please note the word here is 'awkward'.) D&D et al realize this and are of the opinion that declarative rules (CONSTRAINT: RM Prescription # 23)[1] are needed to cover these cases.

An example using Tutorial D:

VAR Station REAL RELATION { Name STRING } KEY { Name };

     +--------------+
     |    Vicksburg |
     |  Chattanooga |
     |      Atlanta |
     |  New Orleans |
     +--------------+

    VAR Tracks REAL RELATION { Src Station, Dst Station, Trk   CHAR(10) }
                              KEY { Src, Dst, Trk },
                              FOREIGN KEY { Src } REFERENCES Station { Name },
                              FOREIGN KEY { Dst } REFERENCES Station { Name };

                Src             Dst    Trk 
     +--------------+--------------+-------+
     |    Vicksburg |  Chattanooga |     A |  
     |    Vicksburg |  Chattanooga |     B |
     |  Chattanooga |      Atlanta |     E |
     |  New Orleans |  Chattanooga |    NS |
     +--------------+--------------+-------+

VAR Train REAL RELATION Train { Id CHAR(10), Name STRING } KEY { Id };

     +-------+-----------------------+
     | NS101 | Chattanooga Choo Choo |
     | EW201 |     Mississippi Flier |
     | NS201 |  Casey Jones' Freight |
     +-------------------------------+

    VAR Schedule REAL RELATION { Train CHAR(10),
                                 Src STRING, Dst STRING, Trk INTEGER,
                                 Departs  DATETIME, Arrives  DATETIME,
                                 Orientation CHAR(3) }
                                 KEY { Src, Dst, Trk, Leaves, Arrives },
                                 FOREIGN KEY { Src, Dst, Trk } 
                                   REFERENCES Tracks { Src, Dst, Trk },
                                 CHECK { Arrives > Leaves };

    -- NOTE: I think D&D would do that last check constraint as a 
    --       declared CONSTRAINT. 

     +-------+-------------+--------------+---+-----+-----+----+
     |  NS101| Chattanooga |      Atlanta | E |  DT1|  DT3|  IN|
     |  NS101|   Vicksburg |  Chattanooga | B |  DT4|  DT6| OUT|
     |  NS101| New Orleans |  Chattanooga |NS |  DT4|  DT6|  IN|
     v                                                         v

OK. Verbal. We're talking trains, the tracks the trains run on, and the stations the tracks are laid between. Trains are scheduled onto tracks. This example is slightly cluttered because I'm using logical keys as I don't want accusations that I'm cheating. Besides: synthetic keys won't do for data integrity.

Alert readers will already be onto a problem. Tracks don't have direction. But keys -- being relations themselves -- make us care about direction. The model above is free of redundancy in the sense that facts about tracks (their existance) are recorded exactly once. This makes questions such as "How many tracks are laid between Chattanooga and Atlanta/Vicksburg?" easy to answer. Were I to model direction in this table I'd need a pair of tuples per track.

In technical terms the relationship between the src/dst pairs is not an equivalence. Specifically, (Src = 'A', Dst = 'B') and (Src = 'B', Dst = 'A') are logically indistinguishable. This kind of relationship I'll call 'labelled isomorphism'. (To all the graph theory guys out there, I know, I know, but bear with me.)

Mathematical relations can grok this kind of value correspondence because there you're free to define what you mean by 'equal' (you bugger around with mappings between sets). But keys in the Relational Model are not so flexible. They need equals. I'd also note that in TTM D&D make a big honkin' deal (RM Prescription # 8: Equality) about equality and I further expect that they'd point to type generator tuples or ADTs as a means of escape. This might be OK, except that brings up questions like whether or not the language of Tutorial D is sufficiently general purpose to implement all of the algorithms you'd need efficiently . . .

But I digress.

Q1: How many trains are scheduled out of 'Vicksburg' between

DTX and DTY?

     RANGE Tr AS (
        RETRIEVE Train FROM Schedule 
        WHERE Src = 'Vicksburg' 
          AND Orientation = 'IN'
        UNION
        RETRIEVE Train FROM Schedule 
        WHERE Dst = 'Vicksburg'
          AND Orientation = 'OUT' ),
     RETRIEVE COUNT(Train) FROM Tr WHERE Departs BETWEEN :DTX AND :DTY;

See any keys in this query? And what would the answer be in a set verses a bag algebra? Different, huh. So you'd need to inject some duplicate elimination. But where? Push it below the UNION? Or pull it up after the UNION? And what about this query?

Q2:

RANGE Tr AS (

        RETRIEVE Train, Departs  FROM Schedule 
        WHERE Src = 'Vicksburg' 
          AND Orientation = 'IN'
        UNION
        RETRIEVE Train, Departs FROM Schedule 
        WHERE Dst = 'Vicksburg'
          AND Orientation = 'OUT' ),
     RETRIEVE COUNT(Train, Departs) FROM Tr
       WHERE Departs BETWEEN :DTX AND :DTY;

Answers here, assuming max one train per DATETIME granule, will be a set in either a set or a bag algebra. But in order for the DBMS to know this you have to declare the constraint and compile it into the plan, somehow. This is perfectly possible in theory but it's really hard in practice.

I'll leave the following query to the reader:

Q3: For what intervals are any trains scheduled on the same track as

another train?

And note that, in order to support temporal operations of the kind used in Q3 elegantly, DD&L[2] find themselves obliged to add an entirely new set of relational operators to the data model (T_UNION, T_JOIN etc). (BTW: from a theory perspective this is an admirable work and the book ought to be read.) But I would only observe that the widely quoted D&D diktat "The relational model needs no extension, no correction, no subsumption -- and above all, no perversion! -"[1], needs to be qualified with "except by us!". And this feels less like science to me and more like religon.

[1] Date C. J and Darwen Hugh. "Foundation for Object/Relational Databases:

The Third Manifesto" Addison Wesley, 1998.

[2] C. J. Date, Hugh Darwen, Nikos A. Lorentzos, Hugh Darwen. "Temporal Data

      and the Relational Model: A Detailed Investigation into the Application
      of Interval and Relation Theory to the Problem of Temporal Database
      Management" Morgan Kaufman, 2002.

Received on Tue Sep 02 2003 - 21:40:47 CEST

This message: [ Message body ]
Next message: Christopher Browne: "Re: does a table always need a PK?"
Previous message: Paul Vernon: "Re: Dropping ACID - was Re: does a table always need a PK?"
Maybe in reply to: Lauri Pietarinen: "Re: does a table always need a PK?"
In reply to Bob Badour: "Re: does a table always need a PK?"
Next in thread: Christopher Browne: "Re: does a table always need a PK?"
Reply: Paul Vernon: "Temporal operations"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message