Re: Newbie question about db normalization theory: redundant keys OK?

From: Brian Selzer <brian_at_selzer-software.com>
Date: Sat, 15 Dec 2007 19:38:28 GMT
Message-ID: <U2W8j.80311$Um6.5491_at_newssvr12.news.prodigy.net>


"David Portas" <REMOVE_BEFORE_REPLYING_dportas_at_acm.org> wrote in message news:HZednQ8PPZsCBP7anZ2dnUVZ8qClnZ2d_at_giganews.com...
> "Brian Selzer" <brian_at_selzer-software.com> wrote in message
> news:prJ8j.53582$eY.44749_at_newssvr13.news.prodigy.net...
>>
>>> You are saying that this is a distributed database but that it lacks a
>>> mechanism for accurately propagating changes out to all its nodes?
>>> Well in my view such a DBMS would be broken. It surely violates Codd's
>>> principle of "Distribution Independence". Let's follow your example to
>>> its conclusion though. The solution is to replace whatever copy of the
>>> Blah relation exists in the application with the new Blah relation
>>> that superceded it. Now all emails reach the correct address and there
>>> is no problem that requires a different key.
>>>
>>
>> Forgive me for butting in, David, but where did you come up with the idea
>> that it is a distributed database? What have disconnected
>> applications--that is, applications that use something akin to
>> disconnected ADO recordsets or ADO.NET datasets--to do with distributed
>> databases?
>>
>> The question is: for how long is the data that was just read out of the
>> database considered to be valid? Until the next update? Or is it stale
>> as soon as its read? Does it have something to do with transaction
>> control or locking? If several updates occur between the reading of one
>> piece of information and the reading of another, how can you be sure that
>> any answer that involves both pieces of information is correct? How can
>> you be sure that you haven't read the same information twice?
>
> I agree that these are important issues of application design. I don't
> think they need to affect the database logical design in this case.
>

I think they do. For a single state, first order logic suffices; once more than one state is involved, at a minimum some form of modal logic is necessary.

>> If you use an artificial key and a timestamp (or rowversion), then there
>> can be no doubt as to whether or not the information in question changed
>> between the first reading and the second.
>>
>
> A ROWVERSION (as defined by Microsoft SQL Server) does not tell you
> whether any data has changed. It tells you whether some rows were possibly
> affected by update operations. In other words it can give false
> positives - indicating a change where there is none.
>

You're right, of course: it only tells whether some rows were the target of an update, not whether the information is actually different.

> BTW I seriously doubt whether it would be possible or desirable to
> implement anything like a ROWVERSION in a true RDBMS. The consequences of
> SQL Server's implementation are serious because it attempts to identify
> row data based on something other than keys. I have never been a fan of
> the ROWVERSION feature.
>

In what way does it attempt to identify row data based on something other than keys?

> The only way to tell whether the current state of the database equals some
> previous state is to query again and compare that to a result previously
> retrieved. That comparison is usually based on a key value. It makes no
> difference what type of value is used for the key. The comparison is
> exactly the same whether it is an "artificial" key or otherwise. (I'm not
> too concerned about defining what an "artificial" key is because I don't
> think it matters).
>

I disagree. Unless the identifier is a rigid designator or a rigid definite description, then any comparison based upon that identifier is suspect. It may be the case that the key values are identical, but the individuals in each state that are identified by that key value are different individuals. For example, "the first person in line" can be different people at different times.

> Consider a set of transformations:
>
> T1:a -> T2:b -> T3:c -> T4:a
>
> Where "T1:a" means "The value of the database at time T1 is a". When we
> requery the database at time T4 all we will ever know is that the value is
> the same as at time T1. If it is a requirement to know about the previous
> updates at T2 and T3 then obviously we ought to preserve that information
> in the database - but in many cases it is quite reasonable not to do that.
>
> --
> David Portas
>
>
Received on Sat Dec 15 2007 - 20:38:28 CET

Original text of this message