Re: Data modelling discussion (LONG)

From: Thomas O'Connor <toconnor_at_vcd.hp.com>
Date: Thu, 28 Jul 1994 10:42:49 GMT
Message-ID: <CtnCFD.CAC_at_vcd.hp.com>


johnsona_at_ashley.cofc.edu wrote:
: Basic database design question, one which (currently, to my knowledge)
: has two camps:
 

: Say I am designing a database to stores data acquired from various
: types of instrumentation. Due to the nature of these instrumentation,
: different data attributes are generated. For example,
 

: Gas Chromatography: for each run
: for each analyte, i.e. Benzene
: attribute 1: concentration
: attribute 2: retention time
: attribute 3: area count
: attribute N: etc
: Inductively Coupled Plasma: for each run
: for each analyte, i.e. Silver
: attribute 1: concentration
: attribute 2: emission counts
: attribute 3: number of replicates
: attribute 4: spectral line used
: attribute N: etc
: Gamma Spectroscopy: for each run
: for each isotope: i.e. Uranium-235
: attribute 1: concentration
: attribute 2: Minimim Detectable activity
: attribute 3: counts
: attribute 4: detector efficiency
 

: You see that there are some constants, namely, I am always provided
: a concentration. But each type of instrumentation has its own special
: attributes. Ideally, a single database should store this data, for
: easy retrival in mass, i.e. when reporting all the data for a particular
: sample, which was run on each of these instrumentation.
: The two schools:
 

: A) Create a 2 table structure, with a run and results format.
: The run table is the master table, with a many to one
: relationship with the results table. The results table is
: designed with lots of generic numeric fields. (i.e. it looks
: like the following:
: table results (
: runid number references run.runid,
: field1 number,
: field2 number,
: field3 number,
: ...
: fieldN number) where the number of fields is the largest
: number of field required for the most data intensive instrument.
: Many views are created to support this structure, which each
: directed to the fields applicable to that particular type of
: instrument, with field aliases appropriate to the type of
: data in that field.
 

: B) Create a 3 table structure, with a run,parameter,and attributes
: format. The run is the master to the parameter, and the parameter
: is the master to the attributes. The relationship is as follows:
: run.runid = parameter.runid, a one to many
: attributes.runid = parameter.runid and attributes.parmname
: = parameter.parmname a many to one relationship. Each parameter
: constitutes a row in this table.
 

: The advantage to B is the very good flexibility it allows. The disadvantage
: is the fact that many fetches are required to return all data for a single
: parameter. A has the advantage in this respect, as all data is in a single
: row, but its normalization is poor at best.
 

: So which is better? Are there any other options I may be missing. It seems

Depends! What do you want, flexability in query operations and maximum data integrety, or raw speed. If you plan on posting a few hundred million records, maybe option A would be better. If you will be posting less than say a few hundred thousand and can wait a few seconds for a query to complete, option B is in my opinion a far better choice.

If you select option A, perhaps you should look at database systems that use other than the "relational" model. For example, a network based database such as PICK (I'm not an advocate, but I used it for three years and it kind of grow on you) uses a variable length record with any number of variable length fields. So you could inplement option A without having to worry about allocating empty space.

One other thing. If you believe you can properly implement such a database in just three tables, you are very optimistic. In a previous life I implemented a simple database for site characterization results, including gamma spectroscopy results. That simple database, in third normal form, used 24 tables, including lookup tables, for things like nuclide, units, analysis type etc.

Might I suggest a good database modeling book like "Database Modeling and Design, The Entity-Relationship Approach", by Toby J. Teorey, Morgan Kaufmann Publishing.

--
Thomas J. O'Connor
----------------------------------------------------------------------
Computer Integrated Manufacturing (CIM) Technical Support
Hewlett-Packard Company
Printer Manufacturing Division, 
Vancouver Washington

Email:          toconnor_at_vcd.hp.com          
Fax:            (206) 212-3550
HP External:    (206) 212-5031
HP Telnet:      212-5031
----------------------------------------------------------------------
Received on Thu Jul 28 1994 - 12:42:49 CEST

Original text of this message