Data modelling discussion (LONG)

From: <johnsona_at_ashley.cofc.edu>
Date: 27 Jul 94 19:28:25 -0500
Message-ID: <1994Jul27.192825.1_at_ashley.cofc.edu>


Basic database design question, one which (currently, to my knowledge) has two camps:

 Say I am designing a database to stores data acquired from various   types of instrumentation. Due to the nature of these instrumentation,   different data attributes are generated. For example,

   Gas Chromatography: for each run
    for each analyte, i.e. Benzene

     attribute 1: concentration
     attribute 2: retention time
     attribute 3: area count
     attribute N: etc

   Inductively Coupled Plasma: for each run     for each analyte, i.e. Silver
     attribute 1: concentration
     attribute 2: emission counts
     attribute 3: number of replicates
     attribute 4: spectral line used
     attribute N: etc

   Gamma Spectroscopy: for each run
    for each isotope: i.e. Uranium-235
     attribute 1: concentration
     attribute 2: Minimim Detectable activity
     attribute 3: counts
     attribute 4: detector efficiency

 You see that there are some constants, namely, I am always provided a concentration. But each type of instrumentation has its own special attributes. Ideally, a single database should store this data, for easy retrival in mass, i.e. when reporting all the data for a particular sample, which was run on each of these instrumentation.   The two schools:

  1. Create a 2 table structure, with a run and results format. The run table is the master table, with a many to one relationship with the results table. The results table is designed with lots of generic numeric fields. (i.e. it looks like the following: table results ( runid number references run.runid, field1 number, field2 number, field3 number, ... fieldN number) where the number of fields is the largest number of field required for the most data intensive instrument. Many views are created to support this structure, which each directed to the fields applicable to that particular type of instrument, with field aliases appropriate to the type of data in that field.
  2. Create a 3 table structure, with a run,parameter,and attributes format. The run is the master to the parameter, and the parameter is the master to the attributes. The relationship is as follows: run.runid = parameter.runid, a one to many attributes.runid = parameter.runid and attributes.parmname
    • parameter.parmname a many to one relationship. Each parameter constitutes a row in this table.

The advantage to B is the very good flexibility it allows. The disadvantage is the fact that many fetches are required to return all data for a single parameter. A has the advantage in this respect, as all data is in a single row, but its normalization is poor at best.

 So which is better? Are there any other options I may be missing. It seems that this problem BEGS for a object oriented RDBMS. Each row could be a different object, each of which unique attributes specific to the type of data it contained. How best can this be "simulate" using current technology.
 Please comment as to any other ideas you may have re this issue. If I am too vague to promote discussion, please say so.

TIA alan johnson
General Engineering Labs
Charleston, S.C.
alan.johnson_at_gel.com
johnsona_at_ashley.cofc.edu Received on Thu Jul 28 1994 - 02:28:25 CEST

Original text of this message