Re: Another view on analysis and ER

From: Ruud de Koter <>
Date: Fri, 07 Dec 2007 23:05:22 +0100
Message-ID: <4759c39a$0$243$>

Jan Hidders wrote:

> On 5 dec, 02:10, JOG <> wrote:

>> On Dec 4, 8:33 pm, "David Cressey" <> wrote:
>>> "JOG" <> wrote in message
>>>> Genuine question guys. From an E/R perspective (one of the good
>>>> variants, that allows relationships to have attributes), if I'm faced
>>>> with the following data.
>>>> -- Fred married Wilma in Bedrock.
>>>> -- Barney and Betty married in Paris.
>>>> How do I decide whether I am dealing with a marriage entity or a
>>>> marriage relationship?
>>>> The literature I'm reading here is telling me that the choice is based
>>>> on what 'things' are key to the business. If my business is concerned
>>>> with people (tax collection say), 'marriage' is best modelled as a
>>>> relationship, whereas if the marriages themselves are my focus
>>>> (perhaps I run a church) then its probably better as an entity.
>>>> Have I made the right interpretation here, and is there general
>>>> agreement here? I am much more comfortable seeing that some variants
>>>> allow relationships to themselves have attributes, and that there is
>>>> nothing sacred about choices between using relationships or entities,
>>>> making it a design decision instead.
>>>> Thanks in advance, J.
>>> My answer is that it's subjective. If the subject matter experts all treat
>>> a marriage as a relationship, follow their lead. If the subject matter
>>> experts all treat it as an entity, follow their lead, but insist that
>>> there's a key attroubt that identifies it. (Do the SME's have any such
>>> thing as a "marriage ID" attribute?
>>> Once you switch over from analysis to design, here's what happens: the
>>> attributes that you discovered during analyisis and attached to entities or
>>> relationships will be carried over from your ER model your design model,
>>> which I presume will be a relational model.
>>> The entities and relationships themselves, as such, will all disappear!
>>> Another thing that will carry over is the keys, used to identify instances
>>> of entities in the subject matter (or UoD if you prefer). Sometimes the
>>> keys used by the SME (subject matter experts) are a little too informal and
>>> require "common sense" to disambiguate. That's a special case, and doesn't
>>> affect this discussion.
>>> I'm used to expressing a relational model in terms of tables (relational
>>> tables), but I presume that what follows could be transliterated into terms
>>> of relvars without any difficulty.
>>> Each entity will have a table of its own, with the attributes that pertain
>>> to that entity. The primary key of the table will be the key attribute of
>>> the entity.
>>> Each relationship will have a table of its own, except for a few that can
>>> be piggy backed onto entity tables. The primary key of a relational table
>>> will consists of two or more foreign keys, compound. These tables will
>>> automatically be normalized up to 3NF unless your analysis put an attribute
>>> on the "wrong entity". I'm not sure about normalization beyond 3NF.
>>> The above process is so automatic that you can have software that does it
>>> for you. Indeed, that's what several tools do. They express an ER model
>>> in terms of metadata, and likewise express the relational model in terms of
>>> metadata. And they have a programmed process that will create a relational
>>> model from an ER model. The only tool I know calls the relaional model a
>>> "physical model" and makes it specific to some product like Oracle or DB2,
>>> etc. But that's a trivial detail. The software also turns models into
>>> diagrams and/or create scripts for you.
>>> So where did the entities and relationships go? They disappeared into the
>>> ether! However,
>>> when the application people get around to designing screens and reports,
>>> they can tie each feature back to an original entity or relationship. That
>>> can make the resulting system coherent for the users.
>> Ok so one might summarize the following steps:
>> 1) initial analysis of business processes and important concepts.
>> 2) Formulation of an initial conceptual model (that is necessarily
>> slanted to a certain viewpoint of the UoD).
> You might want to add here that sometimes in this step you have to
> integrate several different UoDs that do not necessarily agree on the
> overlapping parts. I think that is mainly what Ruud is talking about,
> plus the fact that at this stage you might want to anticipate a bit on
> other UoDs that might have to be integrated in the future. You could
> call that "making it more objective" but I think "making it less
> subjective" is more precise. ;-)

>> 3) Translation into a nicely normalized logical model, that's query
>> neutral.
> Normalization is really only a very minor issue here IMO. I've not had
> that much personal practical experience in my life but I did work
> briefly for two big Dutch companies that both had an organization in
> charge of maintaining the global company data model that integrated
> all data models from the applications and databases they had. I worked
> with the guys that did this, and I remember being completely blown
> away buy how much variation there was in concepts such as employee and
> order, even within a single company. I still admire these guys.

>> 4) On demand, extract data back out from the neutral logical model,
>> shaping it either the original conceptual view, or other conceptual
>> views as needs arise from new applications.
>> Great. This all makes perfect sense, and is very clear to boot. A
>> simple process for creating a thorough yet flexible system. It seems
>> obvious even, right?
>> So why on earth would /anyone/ want to drop step 3? I'm at a loss as
>> to why certain cdt'ers (who are clearly intelligent people) seem to be
>> advocating this. An absolute loss I tell you.
> I'm not sure that is what Ruud is saying. Anyone else?

This position is so far from what I think, that I hadn't even considered I would be one of these people before I read this. If anybody read my post as to mean that normalization is a useless operation, they misread it. As for the query neutrality (as I understand it now), no criticism either.

Ruud de Koter.

> -- Jan Hidders Received on Fri Dec 07 2007 - 23:05:22 CET

Original text of this message