Re: re-designing & normalizing - best way to go about this?

From: Theo Peterbroers <peterbroers_at_floron.leidenuniv.nl>
Date: 12 Apr 2002 02:45:40 -0700
Message-ID: <39bb2c10.0204120145.46e24bd8_at_posting.google.com>


d_newsham_at_hotmail.com (D Newsham) wrote in message news:<c883e8dd.0204111430.ede7e7f_at_posting.google.com>...
> Hi all,
> Any advice will be appreciated here. I have been tasked with
> re-designing a database that, IMHO, is a mess. All of the data is in
> one table, fields:
(...)
> I am breaking this table up to try to achieve some normalization:
>
> Table Species:
> (Species_ID (PK), Class, Family, Genus, Species, Order, Authority)
>
> Table Locality:
> (Locality_ID (PK), Collection_Site, County/Province, State/Country,
> Elevation, Geologic Age)
>
> Table Collector:
> (Collector_ID (PK), F_Name, L_Name, Affiliation)
>
> Table Collection_Data:
> (Catalog_ID (PK),
> Species_ID (FK references Species),
> Locality_ID (FK references Locality),
> Collector_ID (FK references Collector),
> Description, Date Collected, Remarks)
>
> My questions are:
> Does it look like I am achieving my goal? I began breaking down the
> main table and what I found was, for instance - in the Species table,
> I may have 200 of the same class, family, genus - and the species is
> different. Should I have a seperate table each for class, family,
> genus... my thinking is not, because the species is dependant on the
> other 3 fields, but when I look at 2500 entries in the species table
> and see class repeated over and over... I am not sure.
>
> I need to design this database so other research projects can be added
> in the future, so I need to get it right when I start.
>
> Thanks again, for any advice!
> Dina Newsham,
> Northern Arizona University

Your steps of normalization look more or less OK to me, depending on intended use. Caveats:
- Table Species can contain just one classification. If you want to record
  synonyms, make (Species_ID, Authority) PK. - Table Collection_Data references Species directly. When you *do* use synonyms,
  this relationship becomes n:m, so create a linking table between Species and
  Collection_Data. This linking table would contain one single column of all
  unique Species_ID's in Species.
- Of what should 'Geologic_age' be an attribute? The answer may be either
  Locality, Collection_site or Collection_Data. - Your reflection about future use is excellent. Many scientists, when setting up
  an information system, start at basically the wrong end; i.e. the observation
  or collection side of things.
  So, what *are* the intended uses of your database? Taxonomy? Collection
  management in a museum? Coordination of fieldwork? Biogeography? Species
  protection (although, since you record 'geologic age' I might assume most of
  your species to be extinct)?

Theo Peterbroers Received on Fri Apr 12 2002 - 11:45:40 CEST

Original text of this message