Re: Key attributes with list values was Re: What are the differences ...KEY

From: Brian Selzer <brian_at_selzer-software.com>
Date: Sun, 26 Feb 2006 21:51:50 GMT
Message-ID: <WppMf.37790$F_3.18338_at_newssvr29.news.prodigy.net>


You asked for an example, so here goes.

Consider the following set of propositions:

Jane Harper is married.
Jane Smith is single.

And a constraint that states that single people cannot become divorced.

Now, imagine that you're the database, and that you've been presented with the following set of values to replace the above set:

Jane Harper is married.
Jane Smith is divorced.

Do you reject the update?

The problem is that there is no way for you to know that Jane Harper in the first set is the same person in the second set. For example, in the original set, Jane Harper's maiden name may be Smith, and Jane Smith may have married Paul Harper. It is impossible for you to know because that information was not provided. Because the update is set-based (and must be, since either update taken individually would violate the primary key constraint), and because the only key can change, it is not possible to enforce the transition constraint because there isn't any mechanism to correlate the propositions in the first set with the propositions in the second.

Also see comments inline.

"Marshall Spight" <marshall.spight_at_gmail.com> wrote in message news:1140978737.494702.103570_at_v46g2000cwv.googlegroups.com...
> Brian Selzer wrote:

>> "Marshall Spight" <marshall.spight_at_gmail.com> wrote in message
>> >> What I'm trying to convey is that from one point of view, a list
>> >> has identity, regardless of its contents.
>> >
>> > I reject this point of view. The system I am building has values
>> > and variables only. There are no pointers, there are no addresses,
>> > and there is no concept of identity. There is only value.
>>
>> What about the list of operations and materials that are required to
>> produce a part?  That certainly has identity.
>

> Nope. Just a value. Same as an int.
>
>
>> > This is not to say that the concept of identity is not consistent.
>> > It certainly is, and useful programming languages have been
>> > built on top of it. It is foundational to OOP. However, useful
>> > systems have been built without it as well; it is not a necessary
>> > concept.
>>
>> I disagree.  It is a necessary concept, not only externally but also
>> internally.
>

> There are plenty of useful general purpose programming languages
> that don't have any concept of identity in them. Mercury, for
> example. (And in the non-general-purpose category, SQL.)
> So I don't see how you could describe the concept as "necessary."
> Useful, arguable, but not necessary. A turing machine doesn't
> have any notion of identity. The lamba calculus has no notion
> of identity. Come to that, the lambda calculus has no concept
> of equality, either. Huh. So I guess neither one is necessary.
>

> I'm not sure we're using the terms in the same way, though.
> Do you speak Java?
>

> Integer i = new Integer(1);
> Integer j = new Integer(1);
> System.out.println(i==j); // tests for identity
> System.out.println(i.equals(j)); // tests for equality
>

> Is that how you're using the terms?
>

> In Java, == on a reference type tests for reference equality,
> which is to say identity. If there were no reference types,
> as in Prolog or SQL or whatever, then there is no identity.
>

That's the point: a non-1NF attribute is not scalar, hence the ambiguity.

>
>> An entity must have identity, otherwise there's no way for the
>> database, or users, for that matter, to distinguish between them.
>> That's the whole point of keys.
>

> One distinguishes between values by equality. If two values
> are equal, they are the same value. If they are not equal,
> they are not the same value. This is true for key values as
> well as nonkey values.
>
>

> Keys work because one can compare values, not because
> one can compare identies. That is the fundamental difference
> between keys and pointers.

I disagree, and the example above bears me out. The knowledge represented in a database is subject to the universe of discourse, and as such only those attributes that belong to that universe are available to the database. The example above shows that an entity represented in a proposition in one database state may not necessarily be the same entity represented by a proposition in a later database state even if the attribute values are identical.

>

> Members of a set don't have identity.
>

Yes they do. Is a roll of quarters worth 25 cents? No, and to elucidate upon my previous point, the attributes that would uniquely identify each individual quarter may not be relevant within the universe of discourse, but that doesn't change the fact that there are still 40 quarters.

>
>> Because a key value determines all other attribute
>> values, it identifies an entity.
>

> Sure. This requires only values and equality.
>
>
>> But a there's a problem: a key value may
>> change over time, so any given key value's ability to determine what was 
>> or
>> is known about an entity is limited to a specific interval, bounded by 
>> the
>> time that its value became known by the database and the time that a new
>> value became known.  This imposes limitations on the types of updates 
>> that
>> can be performed or the types of constraints that can be enforced.
>

> Um, I don't see how it does. If you want to impose specific semantics
> on the data, then that might constain how you allow the data to be
> updated. But that is true of any constraint. Constraints are semantic
> things; values are logical things. 240 is a perfectly legal int, but it
> might not be an allowed age for a person.
>

Have you tried writing a trigger to audit changes to rows in a table that has only natural keys in SQL Server? Then you've seen the code that's required to prevent multi-row updates that also affect a column that is part of the key.

>
>> If all
>> keys can change, then either updates must be singular, that is, must 
>> affect
>> only one entity of any given type at a time, or no temporal constraint (a
>> constraint that involves the state of the database at more than one point 
>> in
>> time) can be enforced.
>

> I don't see why this should be so. Perhaps I'm just not following your
> terminology. And anyway, if your domain wants keys that don't change,
> just apply a constraint that enforces that.
>

That is not always possible for natural keys, or even for composite keys.

>
>> This is a significant limitation of the Relational
>> Model with which I am most familiar, but I suspect that the concept 
>> applies
>> to all other data models, which may have means to overcome it.  In the
>> Relational Model, all updates are set-based, and if all keys are subject 
>> to
>> change during an update and if the cardinality of the update is greater 
>> than
>> one, then there's no way to determine which tuple in a new relation value
>> corresponds to any given tuple in the original relation value.
>

> I don't see how you can use the word "limitation" to describe
> what you appear to consider the ability to update too much.
> Assuming you add code to reject these updates you don't
> like, would you say that you had "removed a limitation"?
>

The limitation is that you can't write code to reject these updates without also rejecting perfectly valid updates, unless you do it in the application.

> Also, I think you somewhat overstate the case. If I have
> a relation of customers with customerid in the range 1 - 1000,
> and I decide I want customer ids to start at 1,000,000, I can
> "UPDATE Customers set CustomerId = CustomerId + 1000000;"
> and there's an update that changes keys and has
> cardinality greater than one, and I can still determine
> which tuple in the new relation value corresponds to any
> given tuple in the original relation value.
>

But that's because in this case you can detect a pattern between the values. That is not always possible, and besides, it requires that the system probe the right-hand side of the assignment in order to find some means to correlate tuples. Not only is that not practical, as the example above shows, it is not always possible.

>

> If you want a system that supports identity, you don't want to
> be using set theory. There are plenty to choose from, and
> they are well-supported and popular!
>

Yes, I do want to be using set theory. The recognition of identity does not alter the sound theoretical foundation that the Relational Model provides. It strengthens it, or better yet, completes it.

>
>> It should be obvious that correlation is necessary to enforce
>> a constraint that involves more than one database state.
>

> It's not obvious to me.
>

Obviously.

>
>> Every proposition must
>> necessarily be different from every other proposition, because either
>> something is known, or it isn't: the knowledge contained in a database is 
>> a
>> set of propositions, not a collection.  Thus every proposition has 
>> identity
>> with respect to the state of the database at any specific point in time, 
>> and
>> that identity can be revealed as an attribute.
>

> If what you're saying here is "every relation must have at least one
> key"
> then I agree. If that's not what you're saying, then I don't
> understand.
>
>
>> In order to avoid losing
>> information over time, every new proposition must have a new identity 
>> value.
>

> In this and in the previous part, it appears you are using the term
> "identity value" as a synonym for key. Is that correct?
>

No. It is not. Every proposition in the entire database must be distinct from every other proposition and thus has identity with respect to the database, regardless of whether those propositions are organized into relations, some multi-valued construct or whatever else. An identity value is a representation of a proposition's identity with respect to the entire database. If the database is organized into relations, then an identity attribute is clearly a candidate key on any relation that reveals it, but conceptually, its value must be distinct throughout the database, not limited to the scope of a single relation.

>
>> By that I mean that new values exist only for propositions that are
>> completely new to the database rather than to propositions that have been
>> changed.  In other words, something can become known by the database, and
>> something that is already known can change.  The distinction is subtle, I
>> know, but necessary--especially in a temporal database, but also in one 
>> that
>> only requires that transitions be constrained.
>

> Perhaps an example is in order.
>
>

> Marshall
>
Received on Sun Feb 26 2006 - 22:51:50 CET

Original text of this message