Re: Newbie question about db normalization theory: redundant keys OK?

From: David Portas <REMOVE_BEFORE_REPLYING_dportas_at_acm.org>
Date: Sat, 15 Dec 2007 09:08:46 -0000
Message-ID: <HZednQ8PPZsCBP7anZ2dnUVZ8qClnZ2d_at_giganews.com>


"Brian Selzer" <brian_at_selzer-software.com> wrote in message news:prJ8j.53582$eY.44749_at_newssvr13.news.prodigy.net...
>
>> You are saying that this is a distributed database but that it lacks a
>> mechanism for accurately propagating changes out to all its nodes?
>> Well in my view such a DBMS would be broken. It surely violates Codd's
>> principle of "Distribution Independence". Let's follow your example to
>> its conclusion though. The solution is to replace whatever copy of the
>> Blah relation exists in the application with the new Blah relation
>> that superceded it. Now all emails reach the correct address and there
>> is no problem that requires a different key.
>>
>
> Forgive me for butting in, David, but where did you come up with the idea
> that it is a distributed database? What have disconnected
> applications--that is, applications that use something akin to
> disconnected ADO recordsets or ADO.NET datasets--to do with distributed
> databases?
>
> The question is: for how long is the data that was just read out of the
> database considered to be valid? Until the next update? Or is it stale
> as soon as its read? Does it have something to do with transaction
> control or locking? If several updates occur between the reading of one
> piece of information and the reading of another, how can you be sure that
> any answer that involves both pieces of information is correct? How can
> you be sure that you haven't read the same information twice?

I agree that these are important issues of application design. I don't think they need to affect the database logical design in this case.

> If you use an artificial key and a timestamp (or rowversion), then there
> can be no doubt as to whether or not the information in question changed
> between the first reading and the second.
>

A ROWVERSION (as defined by Microsoft SQL Server) does not tell you whether any data has changed. It tells you whether some rows were possibly affected by update operations. In other words it can give false positives - indicating a change where there is none.

BTW I seriously doubt whether it would be possible or desirable to implement anything like a ROWVERSION in a true RDBMS. The consequences of SQL Server's implementation are serious because it attempts to identify row data based on something other than keys. I have never been a fan of the ROWVERSION feature.

The only way to tell whether the current state of the database equals some previous state is to query again and compare that to a result previously retrieved. That comparison is usually based on a key value. It makes no difference what type of value is used for the key. The comparison is exactly the same whether it is an "artificial" key or otherwise. (I'm not too concerned about defining what an "artificial" key is because I don't think it matters).

Consider a set of transformations:

T1:a -> T2:b -> T3:c -> T4:a

Where "T1:a" means "The value of the database at time T1 is a". When we requery the database at time T4 all we will ever know is that the value is the same as at time T1. If it is a requirement to know about the previous updates at T2 and T3 then obviously we ought to preserve that information in the database - but in many cases it is quite reasonable not to do that.

-- 
David Portas
Received on Sat Dec 15 2007 - 10:08:46 CET

Original text of this message