Re: does a table always need a PK?

From: Paul G. Brown <paul_geoffrey_brown_at_yahoo.com>
Date: 31 Aug 2003 19:28:05 -0700
Message-ID: <57da7b56.0308311828.36d1a501_at_posting.google.com>

Lauri Pietarinen <lauri.pietarinen_at_atbusiness.com> wrote in message news:<bitn92$pb1$1_at_nyytiset.pp.htv.fi>...
> Paul G. Brown wrote:
> > Set algebras were tried. They slowed the systems down. Whether or not this
> > makes them impractical is an open question but to be honest about it, it
> > isn't clear that the much touted advantages of set algebras outweigh their
> > disadvantages. Theory is inherently practical.
> >
>
> Thanks for interesting posting, Paul! Could you give me more details on
> that last paragraph? When were
> they (set algebras) tried? By the original System-R team? At some
> later time? Could it be that
> it was found hard in the mid 70's but could not be tried again because
> of SQL-dominance?

IIRC, this information came to me from a comment in a Chris Date essay in "Relational Writings: 1985-1989", and I believe it was a lesson learnt by both the University Ingres team and the System R team. In [1] you will find the following quote:

"Since elimination of duplicates from a query is an expensive process and not always necessary, the RDS does not eliminate duplicates unless explicitly requested to do so." [1]

In other words, we (the System R team) tried building a system with a set algebra (we tried to eliminate duplicates) but it proved expensive. Actually, it was probably just very complex to figure out how to do it. (Knowing the way things work, there was a meeting once, and this was the last topic on the agenda, and it was hard, and they all wanted to go to lunch.)

It is certainly the 'conventional wisdom' among system types, today. Also, the design of the query decomposition algorithms in University Ingres would have made eliminating duplicate tuples extremely hard as there is no where to plonk the duplicate elimination physical operator. Also very hard to see how to do it efficiently over streams . . .

In checking to see where I read this, I was struck by the following pair of comments:

"Sometimes questions are asked about how the performance of a relational database system might compare to that of a "navigational" system in which a programmer carefully hand-codes an application to take advantage of explicit access paths. Our experiments with System R optimizer and compiler suggest that the relational system will probably approach but not quite equal the performance of the navigational system for a particular, highly tuned application. We believe that the benefits of the relational system in the areas of user productivity, data independence, and adaptability to changing circumstances will take on increasing importance in the years ahead." [2]

"In summary, I would allege that a comparison of two systems using different data models would result primarily in a test of the underlying operating system and implementation skill (or man-years allowed) of the designers and only secondarily in a test of the data models." [3]

Which gives hope that eliminating duplicates might not be cripplingly expensive. We just haven't figured out how to do it efficiently. Well, we know how to pull dupes from a given relation, but doing it efficiently in a query plan, particularly when time-to-first-row is a sensitive performance number, is much harder.

And to repeat: the point here is not that duplicate values are OK. They're not. They complicate design, dash hopes for data consistency, and do make certain optimization problems harder. But they have negative implications. And it isn't reasonable to ignore their dark side.

[1] Astrahan M. M. et al. "System R: Relational Approach to Database

Management" ACM Trans. on Database Systems. 1976

[2] Chamberlain, Donald D. "A History and Evaluation of System R"

Communications of the ACM. 1981.

[3] Stonebraker, M. "Retrospective on a Database System" ACM Transactions on

Database Systems. 1980. Received on Mon Sep 01 2003 - 04:28:05 CEST

This message: [ Message body ]
Previous message: Bob Badour: "Re: does a table always need a PK?"
Maybe in reply to: Lauri Pietarinen: "Re: does a table always need a PK?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message