Re: Key attributes with list values was Re: What are the differences ...KEY

From: Marshall Spight <marshall.spight_at_gmail.com>
Date: 26 Feb 2006 10:32:17 -0800
Message-ID: <1140978737.494702.103570_at_v46g2000cwv.googlegroups.com>


Brian Selzer wrote:
> "Marshall Spight" <marshall.spight_at_gmail.com> wrote in message
> >> What I'm trying to convey is that from one point of view, a list
> >> has identity, regardless of its contents.
> >
> > I reject this point of view. The system I am building has values
> > and variables only. There are no pointers, there are no addresses,
> > and there is no concept of identity. There is only value.
>
> What about the list of operations and materials that are required to
> produce a part? That certainly has identity.

Nope. Just a value. Same as an int.

> > This is not to say that the concept of identity is not consistent.
> > It certainly is, and useful programming languages have been
> > built on top of it. It is foundational to OOP. However, useful
> > systems have been built without it as well; it is not a necessary
> > concept.
>
> I disagree. It is a necessary concept, not only externally but also
> internally.

There are plenty of useful general purpose programming languages that don't have any concept of identity in them. Mercury, for example. (And in the non-general-purpose category, SQL.) So I don't see how you could describe the concept as "necessary." Useful, arguable, but not necessary. A turing machine doesn't have any notion of identity. The lamba calculus has no notion of identity. Come to that, the lambda calculus has no concept of equality, either. Huh. So I guess neither one is necessary.

I'm not sure we're using the terms in the same way, though. Do you speak Java?

Integer i = new Integer(1);
Integer j = new Integer(1);
System.out.println(i==j); // tests for identity System.out.println(i.equals(j)); // tests for equality

Is that how you're using the terms?

In Java, == on a reference type tests for reference equality, which is to say identity. If there were no reference types, as in Prolog or SQL or whatever, then there is no identity.

> An entity must have identity, otherwise there's no way for the
> database, or users, for that matter, to distinguish between them.
> That's the whole point of keys.

One distinguishes between values by equality. If two values are equal, they are the same value. If they are not equal, they are not the same value. This is true for key values as well as nonkey values.

Keys work because one can compare values, not because one can compare identies. That is the fundamental difference between keys and pointers.

Members of a set don't have identity.

> Because a key value determines all other attribute
> values, it identifies an entity.

Sure. This requires only values and equality.

> But a there's a problem: a key value may
> change over time, so any given key value's ability to determine what was or
> is known about an entity is limited to a specific interval, bounded by the
> time that its value became known by the database and the time that a new
> value became known. This imposes limitations on the types of updates that
> can be performed or the types of constraints that can be enforced.

Um, I don't see how it does. If you want to impose specific semantics on the data, then that might constain how you allow the data to be updated. But that is true of any constraint. Constraints are semantic things; values are logical things. 240 is a perfectly legal int, but it might not be an allowed age for a person.

> If all
> keys can change, then either updates must be singular, that is, must affect
> only one entity of any given type at a time, or no temporal constraint (a
> constraint that involves the state of the database at more than one point in
> time) can be enforced.

I don't see why this should be so. Perhaps I'm just not following your terminology. And anyway, if your domain wants keys that don't change, just apply a constraint that enforces that.

> This is a significant limitation of the Relational
> Model with which I am most familiar, but I suspect that the concept applies
> to all other data models, which may have means to overcome it. In the
> Relational Model, all updates are set-based, and if all keys are subject to
> change during an update and if the cardinality of the update is greater than
> one, then there's no way to determine which tuple in a new relation value
> corresponds to any given tuple in the original relation value.

I don't see how you can use the word "limitation" to describe what you appear to consider the ability to update too much. Assuming you add code to reject these updates you don't like, would you say that you had "removed a limitation"?

Also, I think you somewhat overstate the case. If I have a relation of customers with customerid in the range 1 - 1000, and I decide I want customer ids to start at 1,000,000, I can "UPDATE Customers set CustomerId = CustomerId + 1000000;" and there's an update that changes keys and has cardinality greater than one, and I can still determine which tuple in the new relation value corresponds to any given tuple in the original relation value.

If you want a system that supports identity, you don't want to be using set theory. There are plenty to choose from, and they are well-supported and popular!

> It should be obvious that correlation is necessary to enforce
> a constraint that involves more than one database state.

It's not obvious to me.

> Every proposition must
> necessarily be different from every other proposition, because either
> something is known, or it isn't: the knowledge contained in a database is a
> set of propositions, not a collection. Thus every proposition has identity
> with respect to the state of the database at any specific point in time, and
> that identity can be revealed as an attribute.

If what you're saying here is "every relation must have at least one key"
then I agree. If that's not what you're saying, then I don't
understand.

> In order to avoid losing
> information over time, every new proposition must have a new identity value.

In this and in the previous part, it appears you are using the term "identity value" as a synonym for key. Is that correct?

> By that I mean that new values exist only for propositions that are
> completely new to the database rather than to propositions that have been
> changed. In other words, something can become known by the database, and
> something that is already known can change. The distinction is subtle, I
> know, but necessary--especially in a temporal database, but also in one that
> only requires that transitions be constrained.

Perhaps an example is in order.

Marshall Received on Sun Feb 26 2006 - 19:32:17 CET

Original text of this message