Re: does a table always need a PK?

From: Paul G. Brown <paul_geoffrey_brown_at_yahoo.com>
Date: 31 Aug 2003 19:28:05 -0700
Message-ID: <57da7b56.0308311828.36d1a501_at_posting.google.com>


Lauri Pietarinen <lauri.pietarinen_at_atbusiness.com> wrote in message news:<bitn92$pb1$1_at_nyytiset.pp.htv.fi>...
> Paul G. Brown wrote:
> > Set algebras were tried. They slowed the systems down. Whether or not this
> > makes them impractical is an open question but to be honest about it, it
> > isn't clear that the much touted advantages of set algebras outweigh their
> > disadvantages. Theory is inherently practical.
> >
>
> Thanks for interesting posting, Paul! Could you give me more details on
> that last paragraph? When were
> they (set algebras) tried? By the original System-R team? At some
> later time? Could it be that
> it was found hard in the mid 70's but could not be tried again because
> of SQL-dominance?

   IIRC, this information came to me from a comment in a Chris Date essay in   "Relational Writings: 1985-1989", and I believe it was a lesson learnt by   both the University Ingres team and the System R team. In [1] you will   find the following quote:

   "Since elimination of duplicates from a query is an expensive process and    not always necessary, the RDS does not eliminate duplicates unless    explicitly requested to do so." [1]

   In other words, we (the System R team) tried building a system with   a set algebra (we tried to eliminate duplicates) but it proved expensive.   Actually, it was probably just very complex to figure out how to do it.   (Knowing the way things work, there was a meeting once, and this was the   last topic on the agenda, and it was hard, and they all wanted to go to   lunch.)

   It is certainly the 'conventional wisdom' among system types, today. Also,   the design of the query decomposition algorithms in University Ingres would   have made eliminating duplicate tuples extremely hard as there is no where   to plonk the duplicate elimination physical operator. Also very hard to see   how to do it efficiently over streams . . .

   In checking to see where I read this, I was struck by the following   pair of comments:

   "Sometimes questions are asked about how the performance of a relational     database system might compare to that of a "navigational" system in     which a programmer carefully hand-codes an application to take advantage     of explicit access paths. Our experiments with System R optimizer and     compiler suggest that the relational system will probably approach     but not quite equal the performance of the navigational system for a     particular, highly tuned application. We believe that the benefits of     the relational system in the areas of user productivity, data     independence, and adaptability to changing circumstances will take on     increasing importance in the years ahead." [2]

   "In summary, I would allege that a comparison of two systems using different     data models would result primarily in a test of the underlying     operating system and implementation skill (or man-years allowed) of the     designers and only secondarily in a test of the data models." [3]

   Which gives hope that eliminating duplicates might not be cripplingly   expensive. We just haven't figured out how to do it efficiently. Well,   we know how to pull dupes from a given relation, but doing it efficiently   in a query plan, particularly when time-to-first-row is a sensitive   performance number, is much harder.  

    And to repeat: the point here is not that duplicate values are OK. They're   not. They complicate design, dash hopes for data consistency, and do make   certain optimization problems harder. But they have negative implications.   And it isn't reasonable to ignore their dark side.

   KR

          Pb    

 [1] Astrahan M. M. et al. "System R: Relational Approach to Database

     Management" ACM Trans. on Database Systems. 1976     

 [2] Chamberlain, Donald D. "A History and Evaluation of System R"

     Communications of the ACM. 1981.

 [3] Stonebraker, M. "Retrospective on a Database System" ACM Transactions on

     Database Systems. 1980. Received on Mon Sep 01 2003 - 04:28:05 CEST

Original text of this message