Re: DB clasical structure violation

From: Doug Swetland <swetland_at_minn.net>
Date: Fri, 21 Jun 2002 19:46:43 GMT
Message-ID: <3D1382B8.7002D117_at_minn.net>


Mariano Abdala wrote:
>
> Hi! My name is Mariano Abdala. I'm a 3° year student of the Palermo
> University on Systems Analisys. I'm conducting a research on how the
> clasical structure of the databases, is not allways the better option in
> terms of performance.
> The research is going great, but I need some real cases of violation of
> the Normal Forms (and BCNF) in order to improve some [any] aspect of the
> Data Base, to support my theory. So I ask you please to help me on this. I
> would need some real databases STRUCTURES (no data) with this kind of
> violations and the reasons to implementing this.
> I would proudly name any helper in my research and any help you can give
> me, and of course, if you like me to, I would send its complete version once
> finished.
>
> Mostly, mostly thankful, Mariano Abdala.
>
> ( DBProject_at_datafull.com )

Mariano,

I'll bite on this one, because it might lead to some interesting discussion and maybe I'll learn a thing or two. I'll say upfront that I am a big fan of normalized data structures. Most (if not all) online transaction processing systems should be properly normalized. You can think about denormalizing when the primary purpose is online analysis.

I once did a metadata repository for a manufacturing company. Part of the model was the physical data structure found in that organization. They wanted to keep track of databases, tables, column names, etc. of the various systems (applications) used in the company. They also wanted to relate these data structures to the logical structure (entities and attributes), but that's another story.

Here's how I broke it down. The row counts are approximate as I'm too old to remember row counts.

Data Source (< 50 rows)

The original location of the data. This could be a Unix server, a DBMS instance or a physical place like a room or building. These sources may contain one or more data stores.

The primary key was:

        Data Source Identifier Character(10)

Data Store (15,000 rows)

One or more large structured sets of persistent data, usually associated with software to update and query the data. This could be a data base in a data base management system, or a tape containing many files. They may contain one or more data objects.

The primary key was:

	Data Source Identifier	Character(10)
	Data Store Identifier	Character(10)

Data Object (250,000 rows)

A container for ordered sets of fields. These could be tables, segments or views in a database or record layouts for a file. They may contain one or more data elements.

The primary key was:

	Data Source Identifier	Character(10)
	Data Store Identifier	Character(10)
	Data Creator Identifier	Character(10)
	Data Object Name	Character(40)

Data Element (5,000,000+ rows)

The basic unit of information having a meaning and sub categories (data items)
of distinct units and value. Through its name and definition, a data element conveys a single informational concept.

The primary key was:

	Data Source Identifier	Character(10)
	Data Store Identifier	Character(10)
	Data Creator Identifier	Character(10)
	Data Object Name	Character(40)
	Data Element Name	Character(40)

System	(700 rows) 

The organized collection, processing, maintenance, transmission, and dissemination of information, in accordance with defined procedures, whether automated or manual. Any system could have zero or more data stores.

Not all the data sources were captured for all systems. We only included the sources for DB2/MVS and Teradata. The future could include Sybase, SQL Server, IMS, flat files, tapes, XML, etc.

The primary key was:

        System Name (acronym) Character(10)

The system name was derived from the data store identifier and included in Data Store. The convention was that the system name was part of the data store identifier. If the derived system name did not match a valid system name the column was null.

The performance problem came when people wanted to analyze Data Element by System. What data elements are in this system? or What systems contain this data element? The join between Data Element and Data Store wasn't fast enough for online analysis. We tried various indexing schemes and nothing did the trick.

We settled for adding the system name to the data element table. We denormalized several other elements of the data source - data store - data object hierarchy while we were at it. We could also join System to Data Element and get the other attributes. Performance was satisfactory.

We could get away with this because the tables were reloaded each night from scans of the various database management systems. Had this been an online transaction processing system (update, insert and delete), it wouldn't have been a good idea.

I'd be interested in how others would have handled it.

Doug Received on Fri Jun 21 2002 - 21:46:43 CEST

Original text of this message