Re: Another view on analysis and ER

From: David BL <davidbl_at_iinet.net.au>
Date: Fri, 7 Dec 2007 07:37:30 -0800 (PST)
Message-ID: <4d1db512-84d8-4edc-955e-6d91b7a80f9b_at_a39g2000pre.googlegroups.com>


On Dec 7, 8:17 pm, Jon Heggland <jon.heggl..._at_ntnu.no> wrote:
> Quoth David BL:
>
> > I wasn't actually intending that Location be necessary for
> > identification of a marriage. I'll make the intensional definition
> > clearer:-
>
> > married(Husband, Wife, Location) :-
> > Husband is *currently* married to Wife
> > and they (last) got married at Location
>
> > Candidate keys are { Husband } or { Wife }, enforcing monogamy
> > integrity constraints.
>
> So Marriage is a relationship between a Husband and a Wife, yet it is
> identified by either, not the combination? I thought I finally had the
> common definition of "relationship" pegged, and then this comes along.

I see your point, and I can see a similarity to the example of how a team identifier also identifies the team captain in the last post to Jim. Evidently my "definitions" are too simplistic. In this case, it was only a desire for strong integrity constraints that caused the problem so presumably the distinction between entities and relationships could be adjusted to accommodate these functional dependency integrity constraints.

> I suppose I am looking for rigor where there is none, though. The
> definition of entity---something that is identified independently of
> other entities---is also rather half-baked. Take weak entities, for
> instance.

Yes, rather half baked.

> >> What if
> >> Location does not correspond to an entity type? I assume that might be
> >> the case if it were simply a spatial coordinate. Is marriage still a
> >> relationship? If not, what is is?
>
> > I agree that it is strange to think of a spatial coordinate as an
> > entity type.
>
> My attempted point was to posit that a Location might be a non-entity
> attribute, /or/ an entity (say, a church or temple or official building
> of some kind), and to ask whether this made any difference as to whether
> the marriage is a relationship or an entity.

I don't think so.

> >>> By contrast the following predicates are consistent with thinking of a
> >>> marriage itself as an entity
> >>> husband(MarriageId, Husband).
> >>> wife(MarriageId, Wife).
> >>> location(MarriageId, Location).
> >>> or maybe just
> >>> married(MarriageId, Husband, Wife, Location)
> >>> Now whether one "thinks in ER" or "thinks in propositional encodings",
> >>> there has to be good reason to introduce a MarriageId.
> >> Perhaps marriage certificates has a unique number stamped on them, and
> >> you want to record this in your database.
>
> > Yes, that could be a reason. Consider the following intensional
> > definition
>
> > married(MarriageId, Husband, Wife, Location) :-
> > The marriage identified by MarriageId was
> > between Husband and Wife at Location.
>
> > In this case the relation can record marriages that are no longer
> > current, and the only candidate key is { MarriageId }.
>
> And you say this now is an entity, right?

That depends on what you mean by "this". The tuple is always a tuple and represents a fact.

The intensional definition is now explicitly identifying a marriage as an identifiable entity.

> But what if the MarriageId
> represents an entity, just like Husband and Wife presumably do?

Why do you ask that question? That is how it's interpreted.

> Do we
> not then have a situation analogous to the first case, except that the
> relationship is ternary? It is, after all, identified by one of the
> entities it relates, just like the first Marriage.

Yes, exactly. The propositions are still associated with stating relationships between entities, but this time the marriage is taking an explicit part as a named entity.

> > Well I can see you are technically picking holes in my
> > "characterisation" of entity, but I wasn't really putting forward a
> > definition. I think the "characterisation" was very informal and more
> > along the lines of being necessary but not sufficient.
>
> Why necessary?

You're trying to draw me into some formal definitions. I'm only considering how examples tend to pan out in practise, towards some understanding of this neutrality question.

> > Actually, I would prefer to steer clear of trying to pin down such
> > informal words as "entity" and "relationship".
>
> But that is one of the main points of my involvement in this discussion!

Not mine!

> > However, I still believe particular entity types are implicit in the
> > intensional definitions of the predicates. How can they not be?
>
> I think the burden of proof is on your side here. I have tried to show
> that predicates can be interpreted as both entities and relationships
> (and probably even as parts of entities). You seem to admit as much,
> given that you you use the word "informal" about the whole E/R shebang.

I don't interpret predicates as anything other than predicates. An instantiation of a predicate can be used to record a relationship that exists between entities. A set of attribute values can sometimes identify an entity.

You are looking at this from the perspective of the RM mathematical formalism, where entities don't exist. I'm considering the intensional definitions which tend to be stated in natural language and are directly interpreted as factual statements about things in the real world.

> I note that Jan Hidders claimed that the E/R distinction carries through
> to the logical/relational level, but I cannot see how---except by
> arbitrary claims along the lines of "if a relvar has but one key, and
> this key's components are all foreign keys, then the relvar represents a
> relationship".

> I have also tried to show that the predicates, with keys, represent more
> information---important information, at that---than a classification of
> things into entities and relationships, with entities having a single
> key and relationships being identified by their entities (or some subset
> thereof?). And that the predicate representation is simpler, in a way,
> than E/R, since it does not have the fuzzy distinction between entities
> and relationships---it only has the predicate construct. I therefore
> consider an E/R model insufficient, or too simplified to be of much use.
> This, of course, is a matter of taste.

I agree with that. I'm not a fan of E/R models. I prefer to directly write down the predicates. In my opinion the predicates simultaneously encode logical and conceptual design decisions. The predicates have this entity-less RM formalism on the one hand, and the intensional definitions involving entity types on the other. The latter is messy and informal, but cannot be denied.

An important part of the design is to work out what things need to be identified. If we feel that it's necessary to directly identify a thing in order to state facts about it, then we consider it to be an entity. Treat that characterisation as definitional, and only to give some rough idea to that rather vague distinction between entity and relationship.

> > Although the intensional definitions are strictly outside the
> > mathematical formalism of the RM, they are nevertheless fundamental to
> > the meaning and purpose of the database.
>
> Fundamental? Why?

Consider the following predicates

    married(H,W) :- H is currently married to W and H is an Australian citizen.

    married(H,W) :- W has at some time been married to H and W is a current employee of company X.

Aren't the intensional definitions fundamentally important?

> I have a hypothesis. It is that you are so used to thinking about a
> database in terms of entities and relationships that it is impossible
> for you to view it any other way.
> (The idea of viewing a database as a
> collection of facts was a revelation for me in that regard.) In fact, I
> have the opposite problem; I am unable to look at an E/R diagram without
> thinking about relations.

Your hypothesis is way off the mark.

> Consider this proposition: "Jon was born in 1974", encoded in a relvar
> of the form Born(Person, Year). I think we'll agree that represents a
> fact about Jon. You would probably assume that Jon is an entity (though
> I'm unsure about what you'd call the relvar/predicate in itself---is it
> an entity (type)?). But I would also say that the proposition is as much
> a fact about the year 1974! Is 1974 an entity? I really don't care.
> Facts are all.

My use of the word "entity" is no more presumptuous than your use of the word "fact". Can you define what a fact is? In the formalism of the propositional calculus, a proposition is just a formula.

Can you show me a factual statement stated in English that doesn't reference something that could informally and reasonably be called an identifiable entity? I'm assuming the fact must have actual information content (ie no tautologies allowed) and could reasonably be recorded in a real database.

> >>> Aren't you implying that a propositional encoding doesn't commit you
> >>> to a decision about whether a marriage is implicitly or explicitly
> >>> identified? I fail to see how that is possible.
>
> > This is the question I would like you to comment on! Do you agree
> > that a predicate can treat a marriage like a relationship, or
> > otherwise like an entity, which seems at odds with this idea of a
> > neutral logical layer?
>
> A propositional encoding does specify how this marriage is identified,
> yes. What I dispute is the distinction between implicit and explicit
> identification; between entities and relationship. A tuple/fact is
> identified by some combination of its attributes, that is all.

When you say you dispute it, what do you mean? I regard the distinction as definitional, and to the extent that we can make sense of it in particular examples, gives us some consistency in the difference between entities and relationships.

I agree that a tuple/fact is identified by some combination of its attributes. I also agree that a fact must not be thought of as directly representing an entity or a relationship.

> In order to be able to say that a predicate treats something like an
> entity or a relationship, you would need to define precisely what this
> treatment entails---i.e., you would have to 'pin down such informal
> words as "entity" and "relationship"'.

I don't agree. I think the identification distinction applies appropriately in lots of common examples. Just because something breaks down in the fully general case doesn't make it worthless.

I posted because Jim claimed that the marriage example illustrated neutrality in propositional encoding whereas ERM forces bias in the design because of the entity/relationship distinction.

> Even if you were able and willing to do that, it would be backwards to
> claim that the logical layer makes the distinction. You could with as
> much justification define some mapping from predicates to either ducks
> or cats, and then claim that the logical model is not animal-independent.

That's only because the definition of that mapping is at odds with people's conception of ducks and cats.

> >> You define relationships as anything that can be identified /only/ by
> >> its related entitites---if there is an alternate key, the thing is an
> >> entity. Correct? What is the rationale behind this rule?
>
> > It is only a rather vague characterisation, and about as precise as
> > the entity/relationship distinction deserves!
>
> > My rationale is as follows: [...]
>
> > Now in natural language we don't normally name instances of verbs or
> > actions, but instead name subject and object. Eg "Jack kicks John".
> > We don't think of the kick action as an entity that needs to be
> > identified independently of Jack and John. Furthermore a kick is less
> > tangible - it has a fleeting existence.
>
> It seems to me that the more significant point is whether or not the
> kick exists independently of Jack and John. Presumably, it doesn't, and
> introducing/discovering an alternate identifier should not change this.

Exactly, so if we assume the propositional encoding by the DBA is well conceived, it won't directly identify an instance of a kick.

> Is "X kicked Y at time T" (i.e. Jack can kick John multiple times) still
> a relationship?

I think it should be

> Perhaps the definition is "something that cannot be
> identified independently of other things", instead of "something that is
> identified solely by other things"?

That seems an improvement.

> Although that would make a weak
> entity a kind of relationship...

Hmmm.

> > Note furthermore that verb phrases include other types of
> > relationships that don't correspond to actions, such as "is less
> > than", "has" or "is the father of". Note how silly it would seem to
> > name a particular instance of a "has" relationship - ie between a
> > particular pair of entities.
>
> I have an apartment, and I can put a name to my ownership. But I see
> your point.
>
> >> (And what if the alternative means of identification is also a
> >> combination of entities?)
>
> > Relationships won't tend to do that!
>
> Never mind; with my current understanding of the "definition" of
> relation, it wouldn't make a difference.
>
> > At this rather informal level of discussion, relationships are
> > counterparts to relations and as we know,
>
> They are? What, then, are the counterparts of entities and attributes?

I didn't say that well at all. Very informally, a set of attributes (of the RM formalism) plus their values can identify an entity. Amongst other things, relations are used to record the relationships between the entities. Received on Fri Dec 07 2007 - 16:37:30 CET

Original text of this message