Storing Data in a Standard Format or not

From: Phil Maechling <pjmaechling_at_yahoo.com>
Date: 9 Feb 2003 19:21:33 -0800
Message-ID: <202415c5.0302091921.54be5ddc_at_posting.google.com>



We have are constructing a database from a collection of external datasets.
The external datasets talk about the same information but will represent it in different formats.

As an example, latitude and logitude.
One data set will use degees minutes, the other will use decimal degrees. (this is a simple example,others are more complex conversions between formats).

I am looking for discussion of issues relating to this type of problem.
We are discussing whether to convert the external data elements to a standard representation, or whether to preserve the original formats.

We came up with these options, and tradeoffs:

(1) Store original representation.

    No standard format. Not easy to read.
(2) Store one "standard" format.

   Must convert all entries to standard format. Lose original representations. Must show how we converted to standard format.
(3) Store one format, and a flag indicating the representation.
  Must define flags for all representations. Users must convert.
(4) Store two formats, original and a standardized format.
  Two versions of the truth. Makes using data much easier.

Other users must face this issue. What are the standard solutions ? In the theoretical side, I keep thinking that the representation is analogous to a "units" issue.
For every numeric field in the database, we must know the units. Isn't format similiar (the same) to units ?
Can't we just store the original format and keep the units, and the format in a document somewhere ?
This however, makes programmatic access to the data very difficult. Thanks for any suggestions on how this is handled or on useful discussions of the tradeoffs.
Phil Maechling
pjmaechling_at_yahoo.com
maechlin_at_usc.edu Received on Mon Feb 10 2003 - 04:21:33 CET

Original text of this message