Re: [LIU Comp Sci] Need tutoring on Relational Calculus

From: James K. Lowden <jklowden_at_speakeasy.net>
Date: Fri, 2 Jan 2015 18:02:33 -0500
Message-Id: <20150102180233.fa0f3f4c.jklowden_at_speakeasy.net>


On Fri, 2 Jan 2015 04:24:35 +0000 (UTC)
ruben <nowhere_at_nowhere.nor> wrote:

> On Wed, 31 Dec 2014 17:23:12 -0500, James K. Lowden wrote:
>
> > I can think of no way in which "normalization is a failure".
>
> well, think harder about it then

I am sorry to have left the impression I have not thought long and hard about it.

> because just this morning its failure was facing me, eye to eye when
> some expert normalized our database and exhausted the system
> resources and dropped productivity through the floor, killing a good
> part of my new year.

Normalization is a logical concept. It cannot affect physical resources. That said, it is of course possible and usually desirable to implement a normalized physical design. When that is not done, extra care must be taken to ensure the database remains internally consistent.

Whether or not "some expert" in fact produced a BCNF design is not an assertion I would accept at face value. I have seen many terrible designs declared "normalized" that were nothing of the kind. Like most things, normalization is easier said than done.

The easiest steps are the removal of repeating groups (1NF) and determination of keys (functional dependency, 2NF). If your design meets those criteria and performs badly under the intended workload, the usual reason is underpowered hardware. While it's always possible to craft a purpose-built system that outperforms a DBMS, it's usually not economical. That's why it's a $30 billion annual market.

Regarding, http://lambda-the-ultimate.org/node/3762, we left the reservation at:

        "Nowadays it is accepted that ORM (Object-Role Modeling), Object Oriented (OO) and other post-OO methodologies, such as Agile Modeling, Agile Development (Erickson, Lyytinen, & Siau, 2005) are most appropriate not only for programming, but also for analyzing and designing information systems, including, of course, database design."

Far from being "accepted" for the purpose, software development methodologies have exactly nothing to say about database design. The author doesn't seem to know that that "OO databases" rest on the very same theoretical void that the hierarchical and network models did before them.

If you want to use UML or whatever to help you identify attributes and their keys, fine. Just recognize that what you're doing *is* normalization, albeit informally and likely sufficiently.

The article is a bit polemic and contains many factual errors or oversights. For example, while it's true that atomicity and nested relations have been subject to some to-and-fro over the years, the *theorectical* concerns cannot be divorced from implementation: because SQL has (almost) no recursion, nested relations cannot be defined and manipulated. Because any such relationship can be defined in unnested form -- and manipuated with a first-order predicate logic language like SQL (mostly) -- that's how SQL-based designs are implemented.

Two others, briefly: addresses *do* require decomposition if their components are to be used or verified. If you don't care if you have a zip code or otherwise an address that can be delivered to, and you have no need to collate by zip code or city etc., then fine, use a blob. The need will make itself apparent in time, and the mistake is not fatal. And the whole professor-department dustup is really just a failure to understand the problem's own terms. Professor's have departments & offices, and departments have offices. That does not imply that a professor's office's department determines the professor's department.

None of that, by the way, has any bearing at all on your two-day experiment with normalization. Yes, you can find articles that seem to support your opinion. One of the problems with database theory is the average quality of the literature you find online. Maybe that's because SQL is so widely used and tables seem to so easy to understand. Be that as it may, I caution you not to accept the gloss if you can't analyze the content.

I'm sympathetic to what I guess the author's point: that normalization is not the mechanical process it was hoped (hyped?) to be, and that the simple practitioner (or, gee, student) looking for guidance may be led astray by pointy-headed logicians on one hand and know-nothings on the other. The former bury you in language like "relation R(A, B, C) where FDs (A, B) ? C and C ? B hold", and the latter fail to acknowledge the reasons for and value of eliminating redundancy in the database.

Yet if we discard the fancy language and mathematical exactitude, the basic problem is embarrasingly easy. Most normalization issues can be tested with two questions:

  1. If I know X do I know Y? (functional dependency)
  2. Is there more than one Y for X? (repeating groups)

You don't need a degree or a class or a book to understand those questions or (if you know the universe of discourse) their answers. That's part of the beauty of the model and why Codd developed it: relations are easy to understand and develop intuition for. It takes practice to find all the attributes and ask those questions for every one and, sure, there's more to it. But those two questions, asked repeatedly of the right person, will carry you a long way in the right direction.

HTH. --jkl Received on Sat Jan 03 2015 - 00:02:33 CET

Original text of this message