Re: relationship in the database

From: Costin Cozianu <c_cozianu_at_hotmail.com>
Date: Thu, 26 Sep 2002 09:37:23 -0700
Message-ID: <3D9337C3.70702_at_hotmail.com>

Jan.Hidders wrote:

> In article <3D92457D.1030608_at_hotmail.com>,
> Costin Cozianu  <c_cozianu_at_hotmail.com> wrote:
>

>>Jan.Hidders wrote:
>>
>>>In article <61c84197.0209250311.5660da8c_at_posting.google.com>,
>>>Peter Koch Larsen <pkl_at_mailme.dk> wrote:
>>>
>>>No, the consequence is that some assignments are recognized as a single
>>>tuple update.
>>
>>This is an unwanted and unwarranted limitations.

> 
> 
> What limitations? There are no limitations.

You force the the user (application) to use only one tuple at a time updates (ahem relational assignments). This is unacceptable in light of practical problem that databases need to solve and in light of the extra engineering effort that needs to be spent on building such systems.

Mathematical models are not an end in themselves they are only a mean.

>>Then a need is recognized to have 11 digits in order to make room for
>>more numbers. The existing numbers will be recognized by prefixing them
>>with, let's say, the digit "5". Quite easy, right ?
>>UPDATE ........ SET number= '5' || number. [...]
>>
>>Relational assignment doesn't support this.

> 
> 
> Neither does the UPDATE statement. It's the triggers that do the job, and
> relational assignment does not exclude thepossibility of triggers.

Relational assignment has the essential problem that looses semantic information.

I want to say: for each tuple in the relation, apply this function that will return a new tuople, replace the old relation with the set of new tuples. This is what UPDATE has to say to the DBMS.

It is not (or at least IMO) semantically the same as give me the relation, I'll figure out the new relation value, assign it to the relvar.

The later is what relvar assignment "says" to the database, and I as an app developer want to be able to say the former, and I want the database to take into account the difference between the two. If a model doesn't account for the difference, that's fine with me, but I'll look for the better model.

>>What's so fishy about UPDATE after all.
> It assumes a hidden notion of object identity.

No, it doesn't. It only assumes there's a well defined relation (functional relation is desirable) between the new tuples and the old tuples.

Accounting for historical properties and dynamic integrity constraints is justified both practically and theoretically (again, I'll refer to the works of Schewe and Thalheim), although there's sure a lot more work to be done in the area.

>>I'd say that we can define an operator over relvars
>> UPDATE (relvar, transformation)
>>where relvar has to be the name of a relvar (later we can think of
>>views, maybe), and /transformation/ should be a function
>>TUPLETYPE_OF(relvar) -> TUPLETYPE_OF(relvar).
>>I ommitted the WHERE clause because it can eb easily expressed by
>>letting the transformation be the identity when restricted to tuples
>>that don't match.
>>
>>Now the semantics is clear (kind of), but I try, we have to cases:
>>
>> - transformation is injective with regards to the keys (all the
>>candidate keys) -- formally that means the projection of tuple over the
>>candidate key composed with the transformation is injective.

> 
> 
> A transformation is injective if two different relations are always mapped
> to different relations. I don't think that's what you mean here. I'm
> guessing you mean that every result of the transformation satsfies the
> key constraints. That's an undecidable property, by the way, so I don't
> think you would want to depend on that.

I meant the transformation function takes a tuple and returns a tuple. If it is an injective function from tuples to tuples, then the effects of the UPDATE operator in the presence of candidate keys and foreign keys are well defined, and semantically useful and desirable to be supported by the DBMS as shown in the example I posted.

You contend that injectivity property is undecidable, and of course you are right. But the fact of the matter is that the DBMS doesn't need to _decide_ (meaning formal proof in the general case) the injectivity, it has only to _verify_ that its restriction to the set of the existing tuples in that relation being updated, which is quite an easy thing to do and it _has_ to be done anyway.

To go back to the practical example, if I was a DBMS engine I wouldn't have to _prove_ that f(number) = concatenate('5', number) is an injective function on strings, I'd only have to look for all the numbers that I have in the relation, calculate f(number) and make sure that I don't get a violation of the injectivity property. I would then hold a set of (number, f(number)) pairs which would allow me to execute the ON UPDATE CASCADE semantics quite easily, without making use of any notion of object identity.

>>In this
>>case the new tuple values clearly correspond uniquely toold tuple values
>>and we can propagate the foreign keys all over the place.

> 
> 
> No, that is not enough. If I start with the the keys '1' and '2' and they
> are mapped to '3' and '4' then I don't know who became '3' and who became
> '4'.

That's exactly what I said it would happen with relational assignment: you lost semantic information so you can't say which tuple corresponds to which previous tuple.

In case you don't have that information to begin with (let's say the user/application didn't want to apply a transformation), then the resolution is simple: throw an error and abort the operation because it violates integrity constraints. But in case you do have that information (like in the example I posted with the phone numbers ), there's no valid reason not to make use of it.

>>>I used the term "strong key" (a super-key is already something else). A
>>>fancier name is "diachronic key" as it tells you accross time when two
>>>tuples represent the same entity. Note that every diachronic key is also
>>>a candidate key but not every candidate key is necessarily a diachronic
>>>key. However, this notion may not always solve the problem because often
>>>it is exactly this key that is updated, and then you still have the
>>>problem that one entity has disappeard and another one (with a different
>>>diachronic key) has appeared after the assignment.
>>
>>I think I've made a strong argument elsewhere that such a strong key is
>>unnecessary. You definitely don't need it in the relational model, and
>>you don't need it in the ER models. AFAIK it is a pre-requisite in most
>>OODB models.

> 
> 
> You *do* need it in the ER model if you are going to consider updates.

No, I think not. Or at least my reference on ER models (that is the Thalheim's book, and I'll excuse myself for not being able to read other alternative takes on ER) doesn't make use of such notion neither it constructs an equivalent one, and he's being able to construct quite a powerful model.

To quote from the book, an entity type E is a pair { attr(E), id(E) } where attr(E) is the set of attributes and the id(E) is a subset of attr(E) called the key which can be empty (for the puzzled relational reader it is so because an entity can be identified by its relations(ships) to other entities). Later on in the book tyhe Update database operation is defined , although ti si true that not all the logical implications we discussed in this thread are discusssed in the book.

If you have a paper published somewhere on the web, or have a better reference on these diachronic keys, I'd be happy to read about them. I have to admit right now, I'm a bit confused about them (in my mind they're still equivalent to OIDs after all your explanations). Or maybe you can show me a small practical example.

>>>That's funny, because that is exactly why I suggested that the ER model
>>>is often a better data model then the Relational model. There the notion
>>>of update of an entity is much more natural.
>>
>>It "smells" natural, but it ain't. It all boils down to the "object
>>identity", the much cried for notion of OO developers. This thing may exist
>>in reality, but since our software model are merely a description of
>>reality, I don't think we can drag the identity of things into our
>>descriptions.

> 
> 
> Your terminology is very sloppy here, and taken literally you couldn't be
> more wrong. Determining the identity of things, i.e., determining the keys,
> is one of the most important things you have to do when you design a data
> model. I think you are confusing "the identity of things" with OIDs which
> you seem to think that I'm advocating. I don't. Perhaps you are confused
> by the fact that I argued that OIDs and surrogate identifiers are not
> meaningless. That's not the same as saying that they are a good idea. The
> world is not that simple.

Yes. I must admit I am confused by your position. Nevermind that, possibly this subject has to be developped on a thread of its own.

Best regards,
Costin Cozianu Received on Thu Sep 26 2002 - 18:37:23 CEST

This message: [ Message body ]
Next message: Paul Vernon: "Re: relationship in the database"
Previous message: Nathan Allan: "Re: The Practical Benefits of the Relational Model"
Maybe in reply to: Paul Vernon: "Re: relationship in the database"
In reply to Costin Cozianu: "Re: relationship in the database"
Next in thread: Paul Vernon: "Re: relationship in the database"
Reply: Jan.Hidders: "Re: relationship in the database"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message