Re: does a table always need a PK?

From: Paul G. Brown <paul_geoffrey_brown_at_yahoo.com>
Date: 2 Sep 2003 12:40:47 -0700
Message-ID: <57da7b56.0309021140.e506e98_at_posting.google.com>


"Bob Badour" <bbadour_at_golden.net> wrote in message news:<yw15b.367$Py1.39244951_at_mantis.golden.net>...
> "Lauri Pietarinen" <lauri.pietarinen_at_atbusiness.com> wrote in message
> news:3F5442BC.8000500_at_atbusiness.com...
> > Paul G. Brown wrote:
> >
> > >Lauri Pietarinen <lauri.pietarinen_at_atbusiness.com> wrote in message
> news:<bj0da9$ue8$1_at_nyytiset.pp.htv.fi>...
> > >
> > >
> > [snip]
> >
> > > Trouble is, you can't build systems for the most of the time. If
> you're
> > > going to implement something it has to cater to the nasty corner cases:
> the
> > > ones where it isn't clear where to add the duplicate discard operations
> but
> > > where it is clear you're going to need a lot of 'em. (Queries sans
> keys).
> > >
> > >
> > You could also argue that current SQL-implementations miss lot's of
> > corner cases as the now stand. So you would
> > perhaps be trading some corner cases for other ones, i.e. we would gain
> > some things and perhaps lose
> > others...

    SQL-implementations get the corner cases right with respect to the SQL    definition. If they don't, it's a bug. And even this is a far from simple    software engineering problem.

    The SQL standard (I'm being politic here) doesn't make a virtue of its    fidelity to the Relational Model. And in doing so, it certainly makes    some 'corner cases' (for both database and DBMS developers) harder to    implement than they would otherwise be.

    Consequently SQL implementations makes it hard to model lot of corner    cases that the Relational Model elegantly deals with.

>
> Queries without keys? All relations have at least one key.

    Anyone else wanna further Bob's education? No? OK, I'll take it up.

    Bob, for someone who throws words like 'ignorami' around a lot,    you sure have a bit to learn. You could start by reading something    by practically any writer on the Relational Model to the effect that    keys are not associations between tables: they are integrity rules, and    are only incidentally involved in the workload. Yes, even in a strongly    relational schema you can have queries, even join queries, where keys    aren't involved. And if you could think independently you would realize that    keys as classically defined in Relational Theory are awkward when used to    cover a range of real-world integrity problems. (Please note the word here    is 'awkward'.) D&D et al realize this and are of the opinion that    declarative rules (CONSTRAINT: RM Prescription # 23)[1] are needed to cover    these cases.

    An example using Tutorial D:

    VAR Station REAL RELATION { Name STRING } KEY { Name };

     +--------------+
     |    Vicksburg |
     |  Chattanooga |
     |      Atlanta |
     |  New Orleans |
     +--------------+

    VAR Tracks REAL RELATION { Src Station, Dst Station, Trk   CHAR(10) }
                              KEY { Src, Dst, Trk },
                              FOREIGN KEY { Src } REFERENCES Station { Name },
                              FOREIGN KEY { Dst } REFERENCES Station { Name };

                Src             Dst    Trk 
     +--------------+--------------+-------+
     |    Vicksburg |  Chattanooga |     A |  
     |    Vicksburg |  Chattanooga |     B |
     |  Chattanooga |      Atlanta |     E |
     |  New Orleans |  Chattanooga |    NS |
     +--------------+--------------+-------+

    VAR Train REAL RELATION Train { Id CHAR(10), Name STRING } KEY { Id };

     +-------+-----------------------+
     | NS101 | Chattanooga Choo Choo |
     | EW201 |     Mississippi Flier |
     | NS201 |  Casey Jones' Freight |
     +-------------------------------+

    VAR Schedule REAL RELATION { Train CHAR(10),
                                 Src STRING, Dst STRING, Trk INTEGER,
                                 Departs  DATETIME, Arrives  DATETIME,
                                 Orientation CHAR(3) }
                                 KEY { Src, Dst, Trk, Leaves, Arrives },
                                 FOREIGN KEY { Src, Dst, Trk } 
                                   REFERENCES Tracks { Src, Dst, Trk },
                                 CHECK { Arrives > Leaves };

    -- NOTE: I think D&D would do that last check constraint as a 
    --       declared CONSTRAINT. 

     +-------+-------------+--------------+---+-----+-----+----+
     |  NS101| Chattanooga |      Atlanta | E |  DT1|  DT3|  IN|
     |  NS101|   Vicksburg |  Chattanooga | B |  DT4|  DT6| OUT|
     |  NS101| New Orleans |  Chattanooga |NS |  DT4|  DT6|  IN|
     v                                                         v
      

    OK. Verbal. We're talking trains, the tracks the trains run on, and the    stations the tracks are laid between. Trains are scheduled onto tracks.    This example is slightly cluttered because I'm using logical keys as I    don't want accusations that I'm cheating. Besides: synthetic keys won't    do for data integrity.

    Alert readers will already be onto a problem. Tracks don't have    direction. But keys -- being relations themselves -- make us care about    direction. The model above is free of redundancy in the sense that facts    about tracks (their existance) are recorded exactly once. This makes    questions such as "How many tracks are laid between Chattanooga    and Atlanta/Vicksburg?" easy to answer. Were I to model direction in this    table I'd need a pair of tuples per track.

    In technical terms the relationship between the src/dst pairs is not    an equivalence. Specifically, (Src = 'A', Dst = 'B') and    (Src = 'B', Dst = 'A') are logically indistinguishable. This kind of    relationship I'll call 'labelled isomorphism'. (To all the graph theory    guys out there, I know, I know, but bear with me.)

    Mathematical relations can grok this kind of value correspondence because    there you're free to define what you mean by 'equal' (you bugger    around with mappings between sets). But keys in the Relational Model are    not so flexible. They need equals. I'd also note that in TTM D&D make a    big honkin' deal (RM Prescription # 8: Equality) about equality and I    further expect that they'd point to type generator tuples or ADTs as a    means of escape. This might be OK, except that brings up questions like    whether or not the language of Tutorial D is sufficiently general purpose    to implement all of the algorithms you'd need efficiently . . .

    But I digress.

   Q1: How many trains are scheduled out of 'Vicksburg' between

       DTX and DTY?

     RANGE Tr AS (
        RETRIEVE Train FROM Schedule 
        WHERE Src = 'Vicksburg' 
          AND Orientation = 'IN'
        UNION
        RETRIEVE Train FROM Schedule 
        WHERE Dst = 'Vicksburg'
          AND Orientation = 'OUT' ),
     RETRIEVE COUNT(Train) FROM Tr WHERE Departs BETWEEN :DTX AND :DTY;
     

    See any keys in this query? And what would the answer be in a set verses    a bag algebra? Different, huh. So you'd need to inject some duplicate    elimination. But where? Push it below the UNION? Or pull it up after    the UNION? And what about this query?

   Q2:

    RANGE Tr AS (

        RETRIEVE Train, Departs  FROM Schedule 
        WHERE Src = 'Vicksburg' 
          AND Orientation = 'IN'
        UNION
        RETRIEVE Train, Departs FROM Schedule 
        WHERE Dst = 'Vicksburg'
          AND Orientation = 'OUT' ),
     RETRIEVE COUNT(Train, Departs) FROM Tr
       WHERE Departs BETWEEN :DTX AND :DTY;
    

   Answers here, assuming max one train per DATETIME granule, will be a   set in either a set or a bag algebra. But in order for the DBMS to know   this you have to declare the constraint and compile it into the plan,   somehow. This is perfectly possible in theory but it's really hard in   practice.

   I'll leave the following query to the reader:

  Q3: For what intervals are any trains scheduled on the same track as

      another train?

   And note that, in order to support temporal operations of the kind used   in Q3 elegantly, DD&L[2] find themselves obliged to add an entirely new set   of relational operators to the data model (T_UNION, T_JOIN etc).   (BTW: from a theory perspective this is an admirable work and the book   ought to be read.) But I would only observe that the widely quoted D&D   diktat "The relational model needs no extension, no correction, no   subsumption -- and above all, no perversion! -"[1], needs to be qualified   with "except by us!". And this feels less like science to me and more   like religon.

   KR

         Pb

  [1] Date C. J and Darwen Hugh. "Foundation for Object/Relational Databases:

      The Third Manifesto" Addison Wesley, 1998.

  [2] C. J. Date, Hugh Darwen, Nikos A. Lorentzos, Hugh Darwen. "Temporal Data

      and the Relational Model: A Detailed Investigation into the Application
      of Interval and Relation Theory to the Problem of Temporal Database
      Management" Morgan Kaufman, 2002.
Received on Tue Sep 02 2003 - 21:40:47 CEST

Original text of this message