Re: Another view on analysis and ER

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Wed, 05 Dec 2007 12:46:26 -0400
Message-ID: <4756d5e5$0$5261$9a566e8b_at_news.aliant.net>


David Cressey wrote:

> "Jon Heggland" <jon.heggland_at_ntnu.no> wrote in message
> news:fj6737$o2p$1_at_orkan.itea.ntnu.no...
>

>>Quoth David Cressey:
>>
>>>Here's a website I stmbled across:
>>>
>>>http://www.islandnet.com/~tmc/html/articles/datamodl.htm
>>>
>>>Note that, at the start of the introduction,  the author says that

>
> analysis
>
>>>is the most important part of any project.  That's rather different from

>
> the
>
>>>impression I've gotten in response to my topic on "what is analysis".
>>
>>Well, that depends on what analysis is. It seems this guy thinks it's
>>the same as data modeling, which in turn is the same as developing a
>>graphical representation of the client's needs and processes. Is it?
>>Furthermore, you could interpret Marshall's and my response as "we don't
>>do analysis, we just start coding", but I don't think that's what we
>>mean. Myself, I'm skeptical of presenting analysis as a very separate,
>>distinct kind of activity, defined by the kinds of artifacts it
>>produces, i.e. "pretty pictures" to use Bob's term.

>
> Not all modeling is analysis. Some of it is design. In particular, I'm
> going to claim that you discover attributes, but you design relvars. I've
> already have the second claim confirmed by Bob and others.
>
> Bob's distaste for pretty pictures should not obscure the mian theme. A
> model isn't a "pretty picture" as such. Rather, a "pretty picture" is the
> projection of a model on a flat screen. Other projections have been
> proposed. A table written on a whiteboard, with some imaginary sample data
> written into it, proposed by another participant, is another projection of
> a model on a flat screen.
>
> Whether a pretty picture was worth the cost of making it depends on what
> happens next.

Pretty pictures have subtle pitfalls and limiting characteristics. Learning to think without them and to communicate without them improves both thought and communication.

>>But I digress. This was what I meant to respond to:
>>
>>
>>>By the way,  I don't like the author's dialect of ER.  In particular,

>
> his
>
>>>topic on "resolving many-to-many relationships"  is,  I believe

>
> extraneous
>
>>>to ER.  His reification of a "watering"  reminds me of the term

>
> "association
>
>>>entity"  that someone wrote in reposnse to me a few days ago.
>>>
>>>In analysis,  there is nothing to resolve in a many-to-many

>
> relationship.
>
>>>You only have to resolve it when you are designing relational tables  or
>>>relvars.
>>
>>Both yes and no. Reifying relationships can be helpful, but /not/
>>because "Many-to-many relationships cannot be directly converted into
>>database tables and relationships". The point is rather to make it
>>easier to discover their properties---their attributes, mainly, but
>>potentially also other things, e.g. constraints. When I discover a
>>many-to-many-relationship, I usually make it a box, with a name, and ask
>>if there is anything else we want to be able to say about this thing.
>>Often, there is. If there isn't, I can demote it to a line again.
>>
>>This mainly applies to many-to-many-relationships, because business
>>rules / attributes / constraints regarding a one-to-many-relationship
>>are often better relegated to the entity on the many-side (though not
>>always, of course). It has little to do with the implementation (or
>>design?) of many-to-many-relationships in relational databases.
>>
>>Some might argue that reifying relationships is unnecessary, since
>>relationships in "good" E/R dialects can have attributes. What, then, is
>>the difference between an entity and a relationship?

>
> If you look at the metadata in the implemented database, none.

Where do you have to look to find any difference? (Other than one is drawn as a box and the other as a line or diamond.)

>>The best answer I
>>can think of is that an entity is identified by itself, while a
>>relationship is identified by its entities. But what if something has
>>more than one way of identification (i.e. multiple keys)? This is where
>>classic E/R breaks down for me. A "relationship" may be identified by
>>its entities, but also by (say) just one of its entities in combination
>>with a subset of its attributes. And/or perhaps a subset of its
>>attributes, disregarding any entities. Is it then a relationship, a weak
>>entity, or an entity?
>>
>>This is turning into a rant against the classic(?) E/R notation, but
>>here goes anyway. I think it's a bad idea that more than one kind of
>>thing can have attributes. I think it's a bad idea that there are two
>>(or more) different ways of indicating how something is identified.
>>Relationship diamonds are required for non-binary relationships, but are
>>just clutter for binary ones---bad idea.
>>
>>Fortunately, there is (at least) one E/R dialect that resolves all these
>>issues, and in so doing, even makes the distinction between entities and
>>relationships far less important.
>>
>>Apropos this distinction: As to whether marriage is a relationship or an
>>entity, you said that one should listen to the subject matter experts. I
>>have never had such an expert say to me, "No, that's not a relationship,
>>that's an entity!" or vice versa. Have you?

>
> Not in so many words. But they have said things like "a reservation for a
> certain car, on a certain date, by a certain customer has a way of
> identifying it. We call it a 'reservation number'. What you have now
> learned is that the UofD people think of a reservation as a thing in and of
> itself and not just an association between a customer and a car on some
> future date.
>
> This tells you something you need to know about the problem statement: The
> database has to store reservation numbers.
>
> It also tells you something you need to know about database design: you
> have two candidate keys for identifying a relationship, and eventually, a
> relvar. One is reservation number. The other is customer ID, car type,
> and date. If you declare primary keys in your database, you need to pick
> one of these.

Protecting the integrity of data is a primary goal of data management. If one wants to manage one's data, one must declare all candidate keys. Whether one needs to pick one to designate as primary is secondary to this.

   This could have consequences for performance, ease of
> programming, "natural joins" etc. etc.

Performance is independent of choices at the logical level of discourse where one identifies candidate keys or designates primary keys. Performance is only affected at the physical level of discourse.

   You also need to anticipate that
> the application programmers are going to want to be able to find a
> reservation, or the absence of a reservation (CWA), based on the
> reservation number, based on a slip of paper the customer hands the clerk,
> or based on the customer, the car type, and the date.
>
> In some cases, the business rules will make the design decision for you. In
> other cases, the business rules are silent on this score.

I disagree. First, what the application programmers want is irrelevant. They are paid to meet the needs of the organization not their own whim. Second, business rules are essentially synonymous with what the organization needs. Received on Wed Dec 05 2007 - 17:46:26 CET

Original text of this message