Re: Database design question

From: Gary Benson <gary.benson_at_digitalmail.com>
Date: Tue, 07 Nov 2000 09:47:44 GMT
Message-ID: <8u8j3v$2pi$1_at_nnrp1.deja.com>


In article <8u8g06$lh9$1_at_highway.leidenuniv.nl>,   peterbroers_at_rhbcml.leidenuniv.nl (Theo Peterbroers) wrote:
> In article <3A07A322.13984646_at_elbanet.co.at>, Heinz Huber
 <Heinz.Huber_at_elbanet.co.at> wrote:
> >Hi,
> >
> >you've nearly made it. Simply drop the semicolon delimited text field
> >from the table of the papers. You don't need it, since this
 information
> >is in the other two tables.
> >
> >Heinz
> Yes, but

Cheers for all your help everyone,

> (1) Do not underestimate the effort needed to keep your table of
 authors
> correct. There will be misidentifications and variant spellings.
 Storing one
> name per author (which seems the natural thing to do) makes it
 impossible to
> correctly cite some papers. Storing more than one name introduces
 errors in
> things like counting papers per author.

There are already some problems in the authors, mostly of the type where one paper will be under something like 'Surname, F.' and another will be under 'Surname, F. M.'. I think I'm going to write some kind of script to look for that kind of thing, and then ask which names should be merged.

> (2) You may want to retain the order in which authors appear in the
 title of
> the paper. Just add a column to the authors_lut table to store this
data.

I do need to retain the order, but I don't think I'll store it in the authors_lut. What I think I'm going to do, more from a data management point of view than a database design point of view, is to have the 'papers' table contain the original, unmodified data from the external source, and introduce other tables (like authors, authors_lut...) for rapid searching. That way, the original data remains untouched.

I'll just live with the overhead...

> >
> >Gary Benson wrote:
> >>
> >> Hi,
> >>
> >> Thus, each author has one entry in the authors table, and one entry
 per
> >> paper in the authors_lut table. This works fine, but the data
> >> representing the author's names is stored twice, once in each paper
 they
> >> wrote and once in the authors table. To fulfil my quest for
 efficiency,
> >> is there any neater way of doing this?
>

Sent via Deja.com http://www.deja.com/
Before you buy. Received on Tue Nov 07 2000 - 10:47:44 CET

Original text of this message