Re: Database design question

From: <geoff_miller_at_my-deja.com>
Date: Wed, 08 Nov 2000 01:24:22 GMT
Message-ID: <8uaa01$gvb$1_at_nnrp1.deja.com>


As a long-time Pick practitioner I have been following this thread and thinking, "This is why multivalued databases were invented."

The same author may use multiple forms of their name, and different authors may have the same name. OK, that can be addressed, but I would prefer to have the computer identify a potential conflict and let me resolve it manually. As Gary said, you have to record both the name as it appears in the paper and an author_ID to cross-reference to the author table.

However, there's another potential level of complexity. You may also need to record the institution(s) with which each author is associated at the time of writing each paper -- suppose you want to identify papers emanating from a particular institution or even a particular research group.

This thing is growing!

Geoff

In article <8u8j3v$2pi$1_at_nnrp1.deja.com>,   Gary Benson <gary.benson_at_digitalmail.com> wrote:
> In article <8u8g06$lh9$1_at_highway.leidenuniv.nl>,
> peterbroers_at_rhbcml.leidenuniv.nl (Theo Peterbroers) wrote:
> > In article <3A07A322.13984646_at_elbanet.co.at>, Heinz Huber
 <Heinz.Huber_at_elbanet.co.at> wrote:
> > >Hi,
> > >
> > >you've nearly made it. Simply drop the semicolon delimited text
 field
> > >from the table of the papers. You don't need it, since this
 information
> > >is in the other two tables.
> > >
> > >Heinz
> > Yes, but
>
> Cheers for all your help everyone,
>
> > (1) Do not underestimate the effort needed to keep your table of
 authors
> > correct. There will be misidentifications and variant spellings.
 Storing one
> > name per author (which seems the natural thing to do) makes it
 impossible to
> > correctly cite some papers. Storing more than one name introduces
 errors in
> > things like counting papers per author.
>
> There are already some problems in the authors, mostly of the type
 where
> one paper will be under something like 'Surname, F.' and another will
 be
> under 'Surname, F. M.'. I think I'm going to write some kind of script
> to look for that kind of thing, and then ask which names should be
> merged.
>
> > (2) You may want to retain the order in which authors appear in the
 title of
> > the paper. Just add a column to the authors_lut table to store this
> data.
>
> I do need to retain the order, but I don't think I'll store it in the
> authors_lut. What I think I'm going to do, more from a data management
> point of view than a database design point of view, is to have the
> 'papers' table contain the original, unmodified data from the external
> source, and introduce other tables (like authors, authors_lut...) for
> rapid searching. That way, the original data remains untouched.
>
> I'll just live with the overhead...
>
> > >
> > >Gary Benson wrote:
> > >>
> > >> Hi,
> > >>
> > >> Thus, each author has one entry in the authors table, and one
 entry
 per
> > >> paper in the authors_lut table. This works fine, but the data
> > >> representing the author's names is stored twice, once in each
 paper
 they
> > >> wrote and once in the authors table. To fulfil my quest for
 efficiency,
> > >> is there any neater way of doing this?
> >
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.
>

Sent via Deja.com http://www.deja.com/
Before you buy. Received on Wed Nov 08 2000 - 02:24:22 CET

Original text of this message