Re: Relation or attribute and why

From: dawn <dawnwolthuis_at_gmail.com>
Date: 23 May 2006 13:44:29 -0700
Message-ID: <1148417069.566461.220850_at_j55g2000cwa.googlegroups.com>


erk wrote:
> dawn wrote:
> > In response to Tony, I agree that other than collecting multiple names,
> > there is nothing to be gained. So we derive the name and store the
> > parts (first, last, middle), whereas with the date, we store the date
> > and derive the parts (month, day, year).
> >
> > What is practice that we used to decide to produce a logical data model
> > in this way, sometimes dumping nouns that are collectives from the CDM,
> > sometimes dumping the parts, when preparing the LDM?
>
> Every data model is relative to the business domain and requirements.

Definitely. I am starting with a CDM for this purpose, so that all of the business requirements should be represented there.

> In the case of entities and attributes, it depends on what your domain
> predicates are. If you have nothing to say about a name other than that
> it is Joe X. Blow for some entity, then it's an attribute.

In my example, the conceptual data model includes: name, firstName, lastName with relationships such that name has-a firstName and name has-a lastName.

> If there are
> further useful things to be said about the name (e.g. attributes like
> "date acquired"), then it's an entity.

OK, I get that if we have additional things to be said about the name, that would be a reason to make it a relation. In this case, we have a composition where name is composed of lastName and firstName (and we could toss in nameTitle, middleName, and nameSuffix and it would not change the situation). In this case, we derive the name (typically removing it when going from the CDM to the LDM), while retaining the components of the name for the LDM. Why?

> > If the rule of thumb has to do with "the simplest" then is there a
> > logical distinction for why deriving the name is simplest in the one
> > case and deriving the date parts is simplest in the other or is this
> > based on the tools used (having date as a built-in type, but not name,
> > for example)? --dawn
>
> It depends on what you mean by simple. In the case of a date, the parts
> are relative to a Calendar; for us in the U.S., a calendar date is the
> application of a Gregorian function to a point in time (measured as
> nanoseconds since 1900, for example), and of course moderated by
> timezone.
>
> >From that point of view, the point in time is the "simplest."

That helps. Also, unlike the point in time, the name is too difficult (impossible) to derive the components from, where it is possible to derive the name from the components. So, I think the rule of thumb on this this has something to do with the difficulty in preparing required derivations. It is simplest to request the parts of the name because it is hard (impossible) to extract a last name from a name. It is easy to derive the month from a date, however it is represented.

Normalization through Nth normal form are related to the mapping from a CDM to an LDM. I was wondering what other rules or rules of thumb relate to that mapping. I think there is something related to derived data (not just functional dependencies) that comes into play, but I haven't yet stated it in any clear way.

> The date
> displayed is more complex derived data. Thus extracting the parts is a
> set of functions, complex but thankfully usually implemented already in
> a library.

Yup, I'm with you.

> A useful example is marriage. For some apps, a boolean attribute on a
> Person relation is enough (meaning: "X is currently married"). For some
> apps, a relation with foreign keys to both spouses might be in order
> (relation predicate: "X is married to Y", commutative). For some apps,
> a date might be attached to that relation ("X married Y on date D").
> For tracking church events, you'll have data about the church,
> minister, etc. If you're tracking polygamists, you'll have something
> different. Etc.
>
> One person's attribute is another person's entity. It depends on the
> business case.

I understand that it depends on the business case, but can we figure that the business requirements have been captured if we are starting with a CDM? I'm looking at the step that starts with the CDM and ends up with a logical data model. What are all of the rules or even best practices that are relevant to that step? Thanks. --dawn Received on Tue May 23 2006 - 22:44:29 CEST

Original text of this message