Re: Any concrete example of BNCF ???

From: Scot A. Becker <scotb_at_inconcept.com>
Date: 2000/05/12
Message-ID: <iyVS4.1548$3p4.605398_at_typhoon.mn.mediaone.net>


Hi David,

First of all, thanks for your feedback and thoughts on my articles.

> Normalization discourses never adequately answered the question of whether
> normalization is a method of analysis or a method of design. In all, the

I don't think it is particularily good at either. If I had to pick one, I'd say design, because I believe that proper analysis requires users. And nothing will make users want to agree with whatever you suggest just so they can get out of the room quicker that the word "tuples".

That was a bit toungue-in-cheek, perhaps, but we use too much jargon (entity, attribute, relationship, tuple, foreign key, candidate key, et al.) that the users simply do not comprehend (nor do they care). That is why I like ORM, particualrily at the analysis stage, because it is just sentences.

As a side diatribe, I tend to draw a very clear distinction in all of my models between analysis and design. The analysis model will be straight forward and capture the requirments as specified. The design model, aside from denormalizations and the like, may be more abstract (modleing/implementing a generalized tree structure, for example, instead of a specific product hierarchy)

> examples, just like the one you cited, it appears on the surface that
 what
> we are doing is designing tables (at least as far as deciding which
 columns
> go in which tables). The diagrams tend to make it look like table design
 is
> what normalization is about.

Essentially, yes. It's hard to discuss normalization without tables with data or relational algebra equations. For reasons that hopefully are obvious, I chose the former. <eg>

> But what we are really doing, if you read the narrative carefully, is
> discovering the functional dependencies. The functional dependencies are
> NOT derivable from the sample data, contrary to what is often said.

If the sample data was "sufficent" (and correct), you could argue the other way. However, pragmatically, I agree with you.

> Instead, the sample data, and the apparent lack of full normalization
 that
> we see, suggest questions we can ask the subject matter experts in order
 to
> discover the true functional dependencies.

This is usually the case, yes. Further, the use of ORM, with it's elementary facts and sample data, completely derives the FDs.

> Asking those questions, and recording the answers, is analysis, not
design.

I agree.

> Now turning to your ORM example. I've got to read this more slowly before
> commenting at length.
> But I want to quote one tiny little piece of what you wrote: "the sample
> data and a little bit of knowledge of the UoD" . That little bit of
> knowledge is the result of subject expertise or, if its in the minds of IT
> people, the result of analysis.

Yup.

> Now let me tie these two together. When Normalization was invented, data
> analysis was usually done by what we now call reverse engineering. That
> is, we started with some data stored on punched cards, magnetic tape,
> sequential files on disk, or maybe indexed files, and discovered the
> "little bit of knowledge of the Uod" by asking questions. Normalization
> helped by telling us which questions to ask.

When I am reverse engineering, this is how I start as well. My first week on any project usually entails getting at the legacy data and schema, making an ER model of it (for lots of reasons) and looking at the data as compared to the ER representation. In doing so, I'll discover possible uniqueness constraints, subtypes, exclusion constraints, subsets, etc. that I will verify with the users as soon as possible. From there, I begin with ORM.

> When ORM was invented, however, the ideas of models and analysis were
 much
> more developed. It stands to reason, therefore, that we are going to
> start, as you suggest, with a little bit of knowledge already captured
> before we begin to do ORM. That makes the whole process more design
> oriented and less analysis oriented.

Yes and no. In practice, I am often the data analyist AND the data designer. So, in the back of my head, yes, I am thinking about design. However, I think with a little practice, this can be done without sacrificing either. Particualirly if the project is structured such that there is a firm division between analysis and design. My current gig, for example, is using an OO process. During the analysis phase, the legacy data is examined along with user workshops and use cases and (what I call a) analysis ORM model are created. Those artifacts are passed off to design, and design orientated models are then created (including, changes to the ORM). However, since I have experience on both sides of the anaylsis-design divide, it is at this point (when the analysis is done and being handed off to design) that I have some suggestions as to what the design might look like from a persistent data perspective.

> As to whether I "like" ORM better than normalization, I'm going to have
 to
> read the articles in more depth. Please bear with me.

I'd be happy to discuss this further.

The first article was an argument, from a normalization standpoint, for the use of ORM. The second was for an online "lecture" I presented to a user group. In the second, I had hoped to completely describe normalization such that I never had to discuss it again (just point people to the article). I knew right away that I failed. <s> I think I cited and researched 4 or 5 texts that included discussions on normalization. At no point did all of the texts completely agree with each other. Further I think I managed to add another viewpoint to the whole mess.

In conclusion, I have learned the following things about normalization:

  1. ORM does it anyway, so why bother with it until you want to worry about denormalizing the resulting schema?
  2. Users never want to hear the word "tuples". <s>
  3. Whenever someone tells you that something has been or should be "denormalized for speed", watch out. It is usually the case that by using proper set-oriented techniques (as opposed to "cursor" orientated techniques that programmers are so fond of), there is little speed difference. Is this always the case? No. Are there situations when I denormalize? Of course. Do I do a careful analysis of the situation, and verify that the new schema is indeed noticeably faster? You bet.

Thanks again,
Scot.



Scot A. Becker

Principal Consultant, InConcept, Inc.

     http://www.inconcept.com

Editor, The Journal of Conceptual Modeling

     http://www.inconcept.com/JCM Received on Fri May 12 2000 - 00:00:00 CEST

Original text of this message