Linking tables with multifield primary keys vs. wasted space.

From: ddtl <this.is_at_invalid>
Date: Tue, 13 Apr 2004 20:00:04 +0200
Message-ID: <feao70ljp3fljs04vi4l53t14k6k8b05ec_at_4ax.com>


Hello everybody,

I am trying to create a database of books (for personal use), and have some doubts about correctness of my design:

Let's say that the only information i care about is books' titles and their authors.

The book can have one or more authors (coauthors), and every author can write one or more books.

Every author has surname and name, though sometimes name is unknown. Also, some authors have middle name (patronymic), and once again, not everybody has, and often it is just unknown.

Additionally, there are authors which have identical names, middle names, surnames or identical combinations of those three (i presume, though, that no two authors have both (trice?) identical name, surname and middle name)

So, from one side i want to conserve space and from the other side - create a well structured database. Here is what i got:

Table 1 [books]

book_id
book_title

Table 2 [person_surnames]

person_surname_id
person_surname

Table 3 [person_middle_names]

person_middle_name_id
person_middle_name

Table 4 [person_names]:

person_name_id
person_name

The [PREFIX]_id field is a unique primary key - some kind of autoincrementing number.

Also, I have the following linking tables:

Table 5 [authors]

person_surname_id      |     
person_name_id	       |> composite primary key
person_middle_name_id  |


And the last one, which connects authors to books:

Table 6 [book_authors]

book_id		       |
person_surname_id      |\                          
person_name_id	       |/ composite primary key   
person_middle_name_id  |                          
			

It seems to me that creating such a relationship instead of, for example, creating one monolithic table called "authors" which will contain all the three fields (name, surname, middle name) will conserve space, because if, for example, a person doesn't have a middle name, i will have to waste 20 bytes for nothing - it will just contain NULL (if field's specification is char(20)), while using the design above in the worst case only 8 bytes will be wasted (if both person's name and middle name are unknown and if DBMS' autoincrementing field is 4 bytes - remember that in any case there have to be a linking table to link authors to books).

Nonetheless, couple of things disturb me:

  1. Already on the first stage there are many linking tables (and i want to store much wider info about books) - is it not going to complicate things too much?
  2. It seems that composite primary keys contain too many fields - is it going to slow queries on database or cause some other unwanted effect (from the other hand i don't see a better solution - of course it is possible to add authors_id field to Table 5 and make it the primary key, while the other fields will be there only as references into their parent tables, but then one can accidentally create two or more identical authors (which have name=name, surname=surname and "middle name"="middle name"), and that is wrong!)
  3. Even if my design is right, it still wastes some space. Is it possible to eliminate that problem?

ddtl. Received on Tue Apr 13 2004 - 20:00:04 CEST

Original text of this message