Re: Another view on analysis and ER

From: dawn <>
Date: Sat, 8 Dec 2007 10:27:57 -0800 (PST)
Message-ID: <>

On Dec 4, 7:10 pm, JOG <> wrote:
> On Dec 4, 8:33 pm, "David Cressey" <> wrote:
> > "JOG" <> wrote in message
> >
> > > Genuine question guys. From an E/R perspective (one of the good
> > > variants, that allows relationships to have attributes), if I'm faced
> > > with the following data.
> > > -- Fred married Wilma in Bedrock.
> > > -- Barney and Betty married in Paris.
> > > How do I decide whether I am dealing with a marriage entity or a
> > > marriage relationship?
> > > The literature I'm reading here is telling me that the choice is based
> > > on what 'things' are key to the business. If my business is concerned
> > > with people (tax collection say), 'marriage' is best modelled as a
> > > relationship, whereas if the marriages themselves are my focus
> > > (perhaps I run a church) then its probably better as an entity.
> > > Have I made the right interpretation here, and is there general
> > > agreement here? I am much more comfortable seeing that some variants
> > > allow relationships to themselves have attributes, and that there is
> > > nothing sacred about choices between using relationships or entities,
> > > making it a design decision instead.
> > > Thanks in advance, J.
> > My answer is that it's subjective. If the subject matter experts all treat
> > a marriage as a relationship, follow their lead. If the subject matter
> > experts all treat it as an entity, follow their lead, but insist that
> > there's a key attroubt that identifies it. (Do the SME's have any such
> > thing as a "marriage ID" attribute?
> > Once you switch over from analysis to design, here's what happens: the
> > attributes that you discovered during analyisis and attached to entities or
> > relationships will be carried over from your ER model your design model,
> > which I presume will be a relational model.
> > The entities and relationships themselves, as such, will all disappear!
> > Another thing that will carry over is the keys, used to identify instances
> > of entities in the subject matter (or UoD if you prefer). Sometimes the
> > keys used by the SME (subject matter experts) are a little too informal and
> > require "common sense" to disambiguate. That's a special case, and doesn't
> > affect this discussion.
> > I'm used to expressing a relational model in terms of tables (relational
> > tables), but I presume that what follows could be transliterated into terms
> > of relvars without any difficulty.
> > Each entity will have a table of its own, with the attributes that pertain
> > to that entity. The primary key of the table will be the key attribute of
> > the entity.
> > Each relationship will have a table of its own, except for a few that can
> > be piggy backed onto entity tables. The primary key of a relational table
> > will consists of two or more foreign keys, compound. These tables will
> > automatically be normalized up to 3NF unless your analysis put an attribute
> > on the "wrong entity". I'm not sure about normalization beyond 3NF.
> > The above process is so automatic that you can have software that does it
> > for you. Indeed, that's what several tools do. They express an ER model
> > in terms of metadata, and likewise express the relational model in terms of
> > metadata. And they have a programmed process that will create a relational
> > model from an ER model. The only tool I know calls the relaional model a
> > "physical model" and makes it specific to some product like Oracle or DB2,
> > etc. But that's a trivial detail. The software also turns models into
> > diagrams and/or create scripts for you.
> > So where did the entities and relationships go? They disappeared into the
> > ether! However,
> > when the application people get around to designing screens and reports,
> > they can tie each feature back to an original entity or relationship. That
> > can make the resulting system coherent for the users.
> Ok so one might summarize the following steps:
> 1) initial analysis of business processes and important concepts.
> 2) Formulation of an initial conceptual model (that is necessarily
> slanted to a certain viewpoint of the UoD).
> 3) Translation into a nicely normalized logical model, that's query
> neutral.
> 4) On demand, extract data back out from the neutral logical model,
> shaping it either the original conceptual view, or other conceptual
> views as needs arise from new applications.
> Great. This all makes perfect sense, and is very clear to boot. A
> simple process for creating a thorough yet flexible system. It seems
> obvious even, right?
> So why on earth would /anyone/ want to drop step 3? I'm at a loss as
> to why certain cdt'ers (who are clearly intelligent people) seem to be
> advocating this. An absolute loss I tell you.

My answers tend toward pragmatism and evaluating trade-offs. What do you give up by taking propositions that originate from a certain perspective and turn those into query-neutral propositions, for example? This is not just a question of what one might give up in theory, but about what, practically speaking, typically gets lost. Order is one aspect of the propositions that is often dropped in an effort to simplify the logical data model, for example. Also, what complexities get added when deciding that query neutrality is of utmost importance? What is the cost of introducing such complexities.

There are at least three significant changes I would like to see compared to what I have seen with systems designed using #3 above: 1. NF2. nested structures/child tables/relation-valued attributes/ Non-1NF as 1NF was traditionally defined. While keeping most FD best practices, when there are weak entities wrt the proposition and the UoD, the benefits of strong coupling of these parts of the proposition to the rest of the proposition (turning them into only child structures) can have greater benefits (in productivity and integrity over time/maintainability) than being able to use the nested structure as the portal for a query (I know I said that in my way and not a standard way, so if you have questions, you know where to find me)

2. 2VL. I'm guessing you consider this irrelevant in this context, because it is not required in theory, but speaking practically, whereever I have seen these "nicely normalized" structures (or at least typically), I have also seen a 3VL implementation. Tools that use 2VL typically also make it easier to query and use propositions that include nested structures, so those avoiding 3VL are likely to also ignore that which was formerly known as 1NF with benefits and without a big downside in so doing.

3. Ordering. Preservation of the ordering of the propositions, including ordering within child structures can improve understanding and communication with the end-user. "On your pizza crust put sauce, cheese, and pepperoni" might or might not be idential in meaning to the proposition "On your pizza crust put sauce, pepperoni, and cheese", choosing that example for David, so if we preserve the order, we might be preserving relevant information even if we are unaware of that, find it irrelevant, or think it expensive, in our analysis.

Each of these relates to potential improvements in the quality of the data and the quality of decisions made based on the data, as well as, potential productivity improvements for developers, and potentially better quality of the overall software throughout the lifecycle. It isn't a slam dunk either way, but one should at least recognize that there are trade-offs.

That's just a start, but I hope it gives you some sense that it might not be completely, utterly, stupid to build data structures without what has traditionally been termed a "nicely normalized logical model"

Oh, and best wishes in your upcoming marriage. We are at 31 years, so don't believe the nay-sayers. smiles. --dawn

> > I apologize for this repsonse. It's really a lot more than you asked for.
> Yes, how dare you respond with such clarity and thoroughness. Shame on
> you.
> > But it's actually easier to use than it is to describe. It may also
> > oversimplify. The relational model that software constructs may not be the
> > best relational model that could be designed to deal with the original
> > problem.- Hide quoted text -
> - Show quoted text -- Hide quoted text -
> - Show quoted text -
Received on Sat Dec 08 2007 - 19:27:57 CET

Original text of this message