Re: Newbie question about db normalization theory: redundant keys OK?

From: Brian Selzer <brian_at_selzer-software.com>
Date: Thu, 27 Dec 2007 08:55:27 -0500
Message-ID: <j9Ocj.1147$El5.530_at_newssvr22.news.prodigy.net>

"David Portas" <REMOVE_BEFORE_REPLYING_dportas_at_acm.org> wrote in message news:GqudnUfPv5UnEO7anZ2dnUVZ8u2dnZ2d_at_giganews.com...

> "Brian Selzer" <brian_at_selzer-software.com> wrote in message 
> news:pAIcj.788$pA7.550_at_newssvr25.news.prodigy.net...

>>
>> "David Cressey" <cressey73_at_verizon.net> wrote in message
>> news:iI7aj.7472$1X.1681_at_trndny07...
>>>
>>> "Brian Selzer" <brian_at_selzer-software.com> wrote in message
>>> news:542aj.25103$4V6.21112_at_newssvr14.news.prodigy.net...
>>>
>>>>
>>>> You're right about DELETE and INSERT, but not about UPDATE. If you
>>>> look
>>> at
>>>> UPDATE in TTM (pps. 112-113), you can see what happens:
>>>>
>>>> UPDATE r ( Ai := X, Aj := Y)
>>>>
>>>> where i <> j is supposedly equivalent to
>>>>
>>>> ( ( EXTEND r ADD ( X AS Bi, Y AS Bj ) ) { ALL BUT Ai, Aj } )
>>>> RENAME ( Bi AS Bk, Bj AS Aj, Bk AS Ai )
>>>>
>>>> where Bi, Bj, and Bk are arbitrary distinct attribute names that do not
>>>> appear in r.
>>>>
>>>> Now if you look at the result of
>>>>
>>>> EXTEND r ADD ( X AS Bi, Y AS Bj ),
>>>>
>>>> you'll notice that each tuple has both the old values and the new
>>>> values
>>> for
>>>> each affected attribute. Clearly if Ai or Aj is prime, then that
>>>> information that ties each tuple in the result to its corresponding
>>>> tuple
>>> in
>>>> r is projected away by "{ ALL BUT Ai, Aj }". Information is lost. The
>>>> result may be the same, but how that result was arrived at is lost in
>>>> translation, and therefore cannot be verified.
>>>>
>>>
>>> It seems to me that, in many discussions over the past few months, you
>>> have
>>> been emphasizing the difference between DELETE followed by INSERT on the
>>> one
>>> hand, and UPDATE on the other, even when the final result of both is a
>>> database in the same state.
>>>
>>> The question arises whether the database communicates any data other
>>> than
>>> the data contained in its state. I'm going to pull a quote from Codd's
>>> 1970
>>> paper. Even though this quote is about consistency, rather than
>>> identity,
>>> I want to draw your attention to the wording.
>>>
>>> "It is important to note that consistency as defined above
>>> is a property of the instantaneous state of a data bank, and
>>> is independent of how that state came about. Thus, in
>>> particular, there is no distinction made on the basis of
>>> whether a user generated an inconsistency due to an act of
>>> omission or an act of commission. "
>>>
>>> It seems to me that all of us who have been building or working with
>>> relational databases, regardless of whether we use SQL as an interface,
>>> or
>>> some tool that might be superior to SQL, have been treating database
>>> state
>>> as independent of history in exactly the above sense. Not only when we
>>> deal
>>> with consistency, but also with regard to entity identity. I don't
>>> know of
>>> any revision made by D&D or others to that perspective.
>>>
>>> The languages described in this newsgroup differ from SQL in at least
>>> one
>>> important respect. SQL distinguishes between actions and transactions.
>>> For
>>> reasons that they themselves have outlined, D&D have come up with a
>>> language in which every transaction can be expressed as a single action.
>>> I
>>> can see the advantages of such a pattern, even without any prectical
>>> experience in such a language. But even with this difference, the
>>> difference between history and state is just about identical between SQL
>>> and
>>> the relational language.
>>>
>>> It seems to me that, if you are going to assert that the history of a
>>> database is part of the interpretation of its content, it's going to be
>>> up
>>> to you to explain how a database can serve up any relevant portions of
>>> its
>>> own history as data.
>>>
>>> If it isn't data, we don't know it.
>>>
>>> I think that until this matter gets resolved somehow, there is going to
>>> be
>>> a continuing disconnect between you and a lot of the regulars in c.d.t.
>>>
>>
>> It's not about history, it's about continuity. Like it or not, from a
>> logical standpoint, during a successful modification there are two
>> separate database values which represent what has been the case and what
>> is the case. They cannot both /be/ the case, due to the closed world
>> assumption. Also, key values were always intended to map to individuals
>> in the universe of discourse, so since each tuple contains at least one
>> key value, each tuple maps to at least one individual. The domain
>> closure, unique name and closed world assumptions make it possible to
>> establish both existence and identity independent of any specific
>> interpretation, so it should be possible to track an individual
>> throughout a modification, even if its identifying features differ before
>> and after. The problem is that a key value that maps to a particular
>> individual before the modification may map to a different individual
>> after the modification, and consequently, any comparison that involves
>> values from before and after requires greater scrutiny.
>>
>> Here is the problem with treating assignment as the only primitive
>> operation. With assignment, all you have available is the before and
>> after images of the data, but there may be many different transformations
>> that could have produced the after image from the before image, and
>> there's no way to tell which transformation actually occurred.

>
>
> Yes there is.

I suspect you would have a difficult time proving that.

Consider a relation R {A, B, C} with 3 keys, {A, B}, {B, C}, and {A, C}. Suppose that you have two values for R, r and r':

             r                                       r'
{{A:1, B:2, C:3},             {{A:1, B:3, C:3},
{A:1, B:3, C:4}}              {A:1, B:2, C:4}}

As you can probably see, we can't tell whether

(1) the values for B were swapped, or
(2) the values for C were swapped, or
(3) none of the above.

As a consequence, we can't tell whether property B for the individual represented by {A:1, C:3} is different, or whether property C for the individual represented by {A:1, B:2} is different, or whether the individual represented by {A:1, C:3} in r is a completely different individual from that represented by {A:1, C:3} in r'. We can determine only that there is an individual that is represented by {A:1, C:3} in r and that there is an individual that is represented by {A:1, C:3} in r'; we cannot determine whether or not they are the same individual.

Now if a delete followed by an insert had been issued, there would be no doubt that the individuals represented in r' are completely different from those represented in r; in other words, there would have had to have been four distinct individuals.
And if an update had been issued, it would be clear that there must have been only two individuals, not four, and it would also be clear whether property B or property C is different. There is also the possibility that a delete, update and insert could have been issued, and in that case it would be clear that there must have been three individuals.

> You say you would solve this using a "for each row trigger" mechanism.

FOR EACH ROW triggers are the only mechanism that is currently commercially available, but I'm sure that a declarative mechanism could be devised.

> But a trigger is just a procedural construct for carrying out other > relational assignments (if not then I'd like to know what else it does).

Not necessarily. They can also be used to enforce constraints--in fact, there was a time when they were the only means available to some database engines for enforcing referential integrity constraints.

Presumably the purpose of those other assigments is to record the

> information that was lost from the transformation represented by some 
> hypothetical UPDATE operator. So why not record that information 
> EXPLICITLY as part of the same operation that performed the original 
> update - ie. an assignment?
>
> Instead, you seem to prefer an UPDATE operation of a kind that you think 
> is flawed - and then propose using a trigger as a crutch to protect the 
> user from leaving out essential information!!
>
> RM is fundamentally a type system. If you cannot agree that assignment is 
> a primitive operation then I think you must sacrifice that very 
> foundation. You are not describing an RDBMS as I know it.
>
> -- 
> David Portas
>
>

Received on Thu Dec 27 2007 - 14:55:27 CET

This message: [ Message body ]
Next message: David Portas: "Re: Newbie question about db normalization theory: redundant keys OK?"
Previous message: David Portas: "Re: Newbie question about db normalization theory: redundant keys OK?"
Maybe in reply to: raylopez99: "Newbie question about db normalization theory: redundant keys OK?"
In reply to Brian Selzer: "Re: Newbie question about db normalization theory: redundant keys OK?"
Next in thread: David Portas: "Re: Newbie question about db normalization theory: redundant keys OK?"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message