Re: A foreign key on a self-referring table
Date: Sat, 6 Mar 2004 18:03:06 +0100
Message-ID: <c2d097$1rr4eb$1_at_ID-184953.news.uni-berlin.de>
> Hi all and sorry for my English.
> I have a question about foreign-keys.
> In a project I'm currently working on, I looked at the database
> design chart (made by other people) and found that some tables
> have foreign keys referring to themselves.
>
> For example:
>
> Table Customers:
>
> |--------------------------|
> |CustomerID |(PK)
> |--------------------------|
> |Name |
> |City |
> |ZipCode |
> |CustomerID |(FK)
> ----------------------------
>
> (That's an example, of course, it's not my real case).
>
> That's quite weird, in my opinion, but I wish to know
> whether it is acceptable, in general, or they are just
> violating IV some IV database design principle.
Hi Andrea,
I will try to explain what I have understood reading some of
the classical books on the subject.
However I must first say that I read them only once, and it will take probably another two readings for being confortable with all the concepts, so I may be wrong, at best informal, in making some statements.
I'm also not a native English writer, so if you feel more confortable we can take this discussion to it.comp.software.database (although here there are lot of people that know the theory much better than me and can correct anything that is wrong).
Let's start with the foreign key question by quoting Chris Date definition of a foreign key from his book "An Introduction to Database Systems":
"
Let R2 be a relation variable. Then a foreign key in
R2 is a set of attributes of R2, say FK, such that:
- There exists a relational variable R1 (R1 and R2 not necessarily distinct) with a candidate key CK, and
- For all time, each value of FK in the current value of R2 is identical to the value of CK in some tuple in the current value of R1 "
It follows from the definition, since R1 and R2 need not be necessarily distinct, that you can have a relation variable with a foreign key constraint defining that the values of one or more attributes references values of one or more attributes of a candidate key of the same relation variable.
Quoting again Date: "Such a relation variable is sometimes said to be self-referencing."
As someone else already pointed out, such a relation variable is often used to represent a hierarchy.
Before trying to understand whether is acceptable or not, I would like to point out something about your example.
The first thing is that two attributes have the same name, this is probably a typographical error, but as you may already know they can't have the same name (although they are from the same type) or we wouldn't be able to distinguish between them.
The second thing is that, although there can be a lot of good examples, I can't think of a good one to model such an hierarchy about customers (what would be the predicate ?).
So to continue with the discussion I would prefer to take a more common example like the classical one about employees.
Let's take this very simple predicate:
"Employee identified by number (EMP#) has name (EMP_NAME) and it's managed by employee with number (MGR_EMP#)"
This would give us the following table:
+----------+
| EMP# | (PK)
+----------+
| EMP_NAME |
| MGR_EMP# | (FK)
+----------+
Each time we insert an employee, we also need to specify another employee that manages him.
And who do we specify when we need to insert someone, like the one at the head of the company, that is not managed by any another employee ?
Codd suggested to use two special markers in place of missing values: the A-mark, which should be used when a value is applicable but is currently not known, and the I-mark, which should be used when a value is inapplicable (please note that in SQL NULL is used almost always to represent both markers).
Letting aside all the discussions about having a 4-valued logic system, in our example we are missing the information because is not applicable.
Before continuing, let's assume our table has some more
rows as in the following example:
+------+----------+----------+
| EMP# | EMP_NAME | MGR_EMP# |
+------+----------+----------+
| 1 | Bob | |
| 2 | Eric | 1 |
| 3 | Paul | 1 |
| 4 | Andrea | 3 |
| 5 | Joe | 2 |
| 6 | Bill | 3 |
| 7 | Mike | 2 |
+------+----------+----------+
which result in the following hierarchy:
1
/ \
/ \
2 3
/ \ / \
5 7 4 6
Quoting Fabian Pascal book "Practical Issues in Database Management":
"In the traditional entity-based approach to business modeling,
each fully normalized table represents (loosely) an entity type.
As indicated in the previous example, simple hierarchies are
frequently represented by just one table. But such tables have
at least one row-representing the root node-that is different from
the others, an indication that not all entities represented by the
table's rows are of only one type.
"
Fabian goes on explaining that bundling multiple entity types produces complications such as the need for special integrity constraint and database traversal problems.
So how can we model the hierarchy in a better way ?
Fabian gives evidence of the fact that such an hiearchy is really just a tree, and there are two types of entity in tree structures: nodes and links.
and the predicate is
"Employee identified by number (EMP#) has name (EMP_NAME)"
the second one is named OrganizationChart
+------+----------+
| EMP# | MGR_EMP# |
+------+----------+
| 2 | 1 |
| 3 | 1 |
| 4 | 3 |
| 5 | 2 |
| 6 | 3 |
| 7 | 2 |
+------+----------+
and the predicate is
"Employee with number (EMP#) is managed by employee with number (MGR_EMP#)"
Please note that this is just a starting point, hierachies can be much more complicated as in the classical "bill-of-materials" where a part can appear multiple times in part assembly, even at the same tree level.
Many problems will start when we need to query the database because, although the relational theory has been extended to handle such complexities, in our daily job we have to face these problems with SQL which doesn't offer much in terms of facilities to handle them.
Hopefully there are several techniques, many of them proposed and studied by regulars of this newgroup, that we can use (I will just give you some pointers to search on Goggle: transitive closure, materialized paths, nested sets and nested intervals).
One more thing before the end: for the sake of keeping the discussion simple I omitted many details.
"Foundation for future database systems: The Third Manifesto" by Chris Date and Hugh Darwen
"Practical Issues in Database Management" by Fabian Pascal
"Temporal data and the relational model" by Chris Date, Hugh Darwen and Nikos Lorentzos
and more you can find in the books section at http://www.dbdebunk.com
Hope this helps.
-- Gianluca Hotz Associate mentor Solid Quality Learning More than just Training www.SolidQualityLearning.comReceived on Sat Mar 06 2004 - 18:03:06 CET