Re: Data modelling discussion (LONG)

From: Jeffrey Jacobs <jjacobs_at_well.sf.ca.us>
Date: 28 Jul 1994 01:29:16 GMT
Message-ID: <3171lc$9jg_at_nkosi.well.com>


In article <1994Jul27.192825.1_at_ashley.cofc.edu>,  <johnsona_at_ashley.cofc.edu> wrote:
>Basic database design question, one which (currently, to my knowledge)
>has two camps:
>
> Say I am designing a database to stores data acquired from various
> types of instrumentation. Due to the nature of these instrumentation,
> different data attributes are generated. For example,
>
> Gas Chromatography: for each run
> for each analyte, i.e. Benzene
> attribute 1: concentration
> attribute 2: retention time
> attribute 3: area count
> attribute N: etc
> Inductively Coupled Plasma: for each run
> for each analyte, i.e. Silver
> attribute 1: concentration
> attribute 2: emission counts
> attribute 3: number of replicates
> attribute 4: spectral line used
> attribute N: etc
> Gamma Spectroscopy: for each run
> for each isotope: i.e. Uranium-235
> attribute 1: concentration
> attribute 2: Minimim Detectable activity
> attribute 3: counts
> attribute 4: detector efficiency
>
> You see that there are some constants, namely, I am always provided
>a concentration. But each type of instrumentation has its own special
>attributes. Ideally, a single database should store this data, for
>easy retrival in mass, i.e. when reporting all the data for a particular
>sample, which was run on each of these instrumentation.
> The two schools:
>
> A) Create a 2 table structure, with a run and results format.
> The run table is the master table, with a many to one
> relationship with the results table. The results table is
> designed with lots of generic numeric fields. (i.e. it looks
> like the following:
> table results (
> runid number references run.runid,
> field1 number,
> field2 number,
> field3 number,
> ...
> fieldN number) where the number of fields is the largest
> number of field required for the most data intensive instrument.
> Many views are created to support this structure, which each
> directed to the fields applicable to that particular type of
> instrument, with field aliases appropriate to the type of
> data in that field.
>
> B) Create a 3 table structure, with a run,parameter,and attributes
> format. The run is the master to the parameter, and the parameter
> is the master to the attributes. The relationship is as follows:
> run.runid = parameter.runid, a one to many
> attributes.runid = parameter.runid and attributes.parmname
> = parameter.parmname a many to one relationship. Each parameter
> constitutes a row in this table.
>
>alan johnson
>General Engineering Labs
>Charleston, S.C.
>alan.johnson_at_gel.com
>johnsona_at_ashley.cofc.edu

A 3rd option is a separate results table for each instrument type (its not clear if there multiple results for the same instrument for the same run, but a) I assume not and b) it wouldn't really matter).

IOW, you would have a GAS_CHROM_RESULT table, which is child to the RUN table. You'd also have a GAMMA_SPEC_RESULT table, which is child to the RUN table.

This avoids the mess involved with a single table and multiple views in A), and makes coding and maintenance a bit easier.

I'm not clear on solution B); is a parameter the same as an instrument? A "generic" solution would be a RUN table, a RESULT_VALUE table, a ATTRIBUTE_DEF table and an INSTRUMENT table.

RESULT_VALUE has the value of a particular attribute for a particular run, e.g. RUN and ATTRIBUTE_DEF are both parents, but not to each other.

ATTRIBUTE_DEF has the name, e.g. "CONCENTRATION", the unit of measure and any other useful information for defining the RESULT_VALUE. Note that ATTRIBUTE_NAME is probably not unique; I assume that the there are different values for "CONCENTRATION" for each of the instrument's test. Note that if concentration is the same for all instruments/tests, it belongs elsewhere (probably in SAMPLE, which you haven't discussed).

EACH ATTRIBUTE_DEF is associated with one and only one INSTRUMENT.

This solution is most applicable if you are constantly changing/adding instruments, which I'm inclined to doubt.

To be really helpful, you do need to provide some additional clarification.

  1. What is a sample? Is it taken from some larger "whole", such as shipment of material?
  2. What is a "run" and its relationship to sample? Does a run consist of multiple tests using different instruments? Can results from the same instrument be returned more than once in the same run, e.g. two GAS CHROMOGRAPH tests in the same run? Can there be multiple runs against the same sample?
  3. What is a "parameter"?
  4. How often do the instrument types change? How often do new ones gets added? Removed? Do the types of information (attributes) change?
  5. Does CONCENTRATION belong to the instrument or is it a property of the sample (or run)?

Clarification of your terms and their relationships would help...

Jeffrey Jacobs & Associate
Oracle CASE Consulting and Training
951-2 Old Country Road, Suite 119
Belmont, CA 94002
Voice: 415-571-7092
FAX: 800-665-1379
CompuServe: 76702,456
Internet: jjacobs_at_well.com Received on Thu Jul 28 1994 - 03:29:16 CEST

Original text of this message