Re: Lucid statement of the MV vs RM position?

From: dawn <dawnwolthuis_at_gmail.com>
Date: 3 May 2006 01:17:59 -0700
Message-ID: <1146644279.405764.107990_at_u72g2000cwu.googlegroups.com>


JOG wrote:
> dawn wrote:
> >[chop]
> > Not entities. Entities are things. It is a person, place, thing or
> > event. Person, place, and event (e.g. transaction) are easy enough to
> > identify. For things, if you cannot hit it with a stick or print it
> > out on paper, then think twice about whether it is an entity or a
> > property of an entity.
>
> No, this is way off base - with ERM Chen wrote extensively about
> associative entities for example, which are _exactly_ the sort of thing
> that cannot be hit with a stick.

Well aware. "Entity" is an overloaded term, of course. If using an ERD approach, the E's you start with, not those used to handle many-to-manys, nor the R's that turn into E's are the basics I'm referring to as entities. I also said to think twice, not to exclude the possibility. Starting with the basics, look at your problem space and identify people, places, events, and "things."

> But I don't want to get sidetracked by
> this so hey, if the colour green as an abstract entity is too out
> there, fair enough. Just substitute in something more physical into
> those original statements - fruits for the colours for example, or
> perhaps academic papers:

If you are going to substitute something I can throw for something I cannot, then we have a new entity, not just a property of an entity. While color is a property of our characters, a paper is not. I would model it as a separate entity in the conceptual data model (if we are using UML class diagrams and not ORM, for example, then I would indicate color as an attribute in the Character class and Paper as a separate class).

> paper( paper1, {Barney, Oscar} ) &&
> paper( paper2, {Barney}) &&
> paper( paper3, {Oscar})
>
> or
>
> person( Barney, {paper1, paper2} ) &&
> person( Oscar, {paper1, paper3} )
>
> Everything here is clearly an entity. Both are strong.

Yes.

> Both
> representations come from those logically identical propositions - so
> which is the right one?

Both and neither. These are fine as derivations, but I would model these propositions the same way you would for implementation, most likely. It is "best practice" to model strong entities as separate classes/files/tables/relations/functions/types... However, if the problem domain is such that our organization could care less about papers (today) and sees (recognizing we re not all-seeing) nothing on the horizon that would suggest that it is a big risk factor -- nothing suggesting our organization will care about any properties of these papers, we only want to record that Barney has paper1 and paper2 while Oscar has paper1 and paper3, then in a move from the conceptual to the logical, we might decide to treat this strong entity as if it were merely a property. Properties that really are properties are cheaper than entities. Entities that can be implemented as properties might be cheaper than entities implemented as entities if you do not have to change them down the line. What happens if we change our mind in the future? We "refactor" our solution. We change it to meet our new requirements. It rarely makes good business sense to design anything so that it mitigates all possible changes -- risk assessment is key.

> Instinct might be to go for the first - papers have authors after all.
> But my application's first job is to produce academic resumes - hence
> the second is the more appropriate one for that 'interface'. Whichever
> is picked, the choice enforces some form of arbitrary hierarchy.

You are correct now that you have changed the question to two strong entities.

> Unless
> I encode both - is that preferable?

It might be preferable that both be views of the data (which SQL-92 would not permit, but perhaps people are now coding NF2 views of their 1NF data?), but I would model both as derivations, not base tables.

> >
> > > Which is correct? Neither
> > > and both,
> >
> > The first, not the second.
> >
> > > because they are artifices. Who knows which will be
> > > appropriate to the user?
> >
> > The systems analyst better be able to give it a good shot. If they
> > cannot tell whether green-ness is an entity or a property of an entity
> > wrt to the organization's requirements, I would be very surprised.
> >
> > > An XML or MV style prejudges that decision,
> >
> > No, it makes a distinction that those modeling with the RM do not make.
> > I am an entity and my shoe size is a property of me. It can change.
> > It could even go away if my feet were amputated (sheesh, apologies for
> > the example), but only as long as I am an entity an organization cares
> > about would that property be relevant. If I am not of interest, then,
> > by definition, my feet are not of interest either. In other words, if
> > you are interested in shoe size, you are interested in me because my
> > shoe size is a property of me.
>
> Yes, a weak entity. This is indeed a good example of something that
> does fit hierarchically,

happy, happy, joy, joy!

> but it *doesn't* negate the fact that there
> are many examples that don't at all.

Of course, of course. You don't want to take your entire problem domain and start willy-nilly nesting relations. It is frustrating to look at some XML doc designs. Give people, many of whom have been taught normalization, an inch and they seem to botch it all up. There are best practices for modeling data when you can use multivalues. You need to be able to see the HUGE difference between Oscar's color and Oscar's paper. If I no longer care about Oscar, I no longer care about his color either.

E-mail addresses are in the fuzzy area where you need to make a call. It typically makes sense that if a person is no longer of any interest to your system, then their e-mail address is not either. So, the e-mail address can be seen as a property of the person. Of course it is the case that on person could ditch an e-mail address and another adopt it, so they do each have a life apart from each other, but there is a low risk for most orgs that their requirements will change enough to separate e-mail address from person.

> Using colours as universals was a
> poor example on my part as I feel its given rise to this red-herring.

No, no, no -- it was a GREAT example. You can see I jumped for joy above when you could see it as I did -- color as a multi-valued property of a strong entity. Then changing your example and seeing how I changed with it, might have further helped.

> > [I realize that if a company is doing a shoe size survey and has no
> > interest in collecting other data about people, then the shoe size
> > might be a property of a survey or some other entity.]
> >
> > > and with large shared data, who's future use is unpredictable, it seems
> > > essential to me to avoid that interface-prison.
> >
> > I disagree, as you have likely guessed. Yes, there are new
> > requirements regularly that prompt one change or other, but I would
> > dare say that there are more changes that would push a property like
> > color to go from single to multi-valued, prompting a schema redesign
> > (new table) than there are if you make color a multivalued attribute to
> > start with.
>
> But, two wrongs don't make a right. Cardinality change is something
> i've mentioned before as an interesting area in schema change as a
> whole - but it does not mean the theoretical solution should be to
> imprison the user into a certain view of the data. all best, J.

I agree. Rule of thumb -- model strong entities as separate, uh, classes? relations? (pick your favorite collective term, and yes I'm aware that some think it is a mistake to compare these two terms in any way). Model properties, whether single or multi-valued, as attributes of these. Make a call, based on risk assessment, whether you should turn an entity into a property at the risk of needing to make a change later. Properties that can stay properties are cheaper than entities. Now if a property was single-valued and becomes multi-valued or vice versa you flip a bit on it.

Whacha think? --dawn Received on Wed May 03 2006 - 10:17:59 CEST

Original text of this message