Re: Any concrete example of BNCF ???

From: David Cressey <david_at_dcressey.com>
Date: 2000/05/12
Message-ID: <4JXS4.583$s4.54951_at_petpeeve.ziplink.net>


I've interspersed my comments as you did....

Scot A. Becker wrote in message ...
>As a side diatribe, I tend to draw a very clear distinction in all of my
>models between analysis and design. The analysis model will be straight
>forward and capture the requirments as specified. The design model, aside
>from denormalizations and the like, may be more abstract
>(modleing/implementing a generalized tree structure, for example, instead
 of
>a specific product hierarchy)

The solution is, from a certain perspective, more abstract than the problem statement. From another perspective, it's less abstract. In the world of architecture, there used to be these things called the "architect's sketch" of the final building, It was often done with watercolors, or some other technique that would blur the details while creating an impression of the whole.

General contractors tended to view these sketches with contempt, and to view the architect's real product as the blueprint. For an analogous reason, programmer analyst types tend to view an analysis model with contempt, but to view a design model as having some real substance to it.

>Essentially, yes. It's hard to discuss normalization without tables with
>data or relational algebra equations. For reasons that hopefully are
>obvious, I chose the former. <eg>

Oh, terminology. Yes, I agree that tables are better. Even tables are sometimes less intuitive than forms and reports. Take invoices, for example. We could show a user a pair of tables, "invoice header" and "invoice item" with a one to many relationship between them. However, we can often gain just as much
out of a discussion with the user by showing a traditional paper invoice, with the header at the top, and
the items below it, arranged in the form of a "repeating group". Now I don't give a R.A. whether this violates 1NF or not, for these purposes. (may Codd have mercy on my soul!)

>If the sample data was "sufficent" (and correct), you could argue the other
>way. However, pragmatically, I agree with you.

A terminological quibble, here: if the sample were sufficient, it wouldn't be a "sample" in the ordinary sense of the word.

>
>This is usually the case, yes. Further, the use of ORM, with it's
 elementary
>facts and sample data, completely derives the FDs.
>

Now, when I learned E-R it was a very, very specific point that E-R did NOT forbid a repating group in an attribute. Thus an "invoice item" didn't have to be a distinct entity. It could be, simply a component of the "invoice" entity. E-R analysis paid very careful attention to assigning the attributes to the correct entities,
using analytic methods that were eeerily parallel to "normalization" but were very, very specifically NOT normalization.

If you did the E-R analysis correctly, and then converted the E-R model to a relational model using a fairly mechanical approach, the relational model ended up mormalized. I only knew about 1NF, 2NF, and 3NF at the time, so I can't tell you whether full normalization was guaranteed.

>When I am reverse engineering, this is how I start as well. My first week
 on
>any project usually entails getting at the legacy data and schema, making
 an
>ER model of it (for lots of reasons) and looking at the data as compared to
>the ER representation. In doing so, I'll discover possible uniqueness
>constraints, subtypes, exclusion constraints, subsets, etc. that I will
>verify with the users as soon as possible. From there, I begin with ORM.
>
A product called Data Architect (part of Power Designer from Sybase) lets you reverse engineer
a database to create a physical model, then reverse engineer that to make a conceptual model,
DA's conceptual models are essentially E-R models with a few extra things that may have been added to the E-R model since I studied it, or may just be "good things" the authors of DA decided to put in.

>Yes and no. In practice, I am often the data analyist AND the data
 designer.
>So, in the back of my head, yes, I am thinking about design. However, I
>think with a little practice, this can be done without sacrificing either.

Yes, but as you said earlier, keeping the line between the two is very important, and can get difficult. In most cases of "thinking inside the box", I believe it's the subconscious proposed solution that creates the box, and not the problem statement itself.

>1) ORM does it anyway, so why bother with it until you want to worry about
>denormalizing the resulting schema?

I'm interested. As I said, I have to read the articles. I won't be surprised if it's an extension, using ORM concepts and terms, of what I've been trying to do with E-R analysis.

>2) Users never want to hear the word "tuples". <s>

Agreed.

>3) Whenever someone tells you that something has been or should be
>"denormalized for speed", watch out. It is usually the case that by using
>proper set-oriented techniques (as opposed to "cursor" orientated
 techniques
>that programmers are so fond of), there is little speed difference. Is this
>always the case? No. Are there situations when I denormalize? Of course. Do
>I do a careful analysis of the situation, and verify that the new schema is
>indeed noticeably faster? You bet.
>

You bet. First off, even in cases where the denormalization results in a speed boost, it often isn't the denormalization as such that gave the speed boost so much as the "colocation" or "clustering". By this I mean that if two items of data are in the same database block, one I/O will read them both.

My favorite database product, Oracle Rdb (a.k.a. DEC Rdb) allows mixed format database pages, and clustering parent and child records. So all the data pertaining ot an invoice could be read with one I/O (unless the number of items were such that the capacity of a databas epage were exhausted). But this can be done without denormalizing the schema.

One of my specialties is tuning Rdb databases. I have a list of ten common bottlenecks I look for first. Excessive normalization doesn't even make the top ten.

Thanks for a good dialogue.

David Cressey, consultant
david_at_dcressey.com Received on Fri May 12 2000 - 00:00:00 CEST

Original text of this message