Re: A simple situation shows confusion about basic concepts

From: Alan <not.me_at_uhuh.rcn.com>
Date: Wed, 27 Oct 2004 00:18:12 GMT
Message-ID: <8PBfd.7380$PZ4.7362_at_trndny07>


Replies in line... (Read the whole thing, as I went through a process in working this out, and I typed as I went along)

"Ney André de Mello Zunino" <zunino_at_undl.org.br> wrote in message news:2u7uehF281l4cU1_at_uni-berlin.de...
> Alan wrote:
>
> > Your original design was _almost_ correct, then. You have a very rare
> > situation- a recursive many-to-many relationship. I've never seen it
happen
> > before.
>
> Hum... That makes me feel a little better about myself, somehow... :)
>
> > Your original design (for historical purposes):
> >
> > UC (Universal Concept)
> > ======================
> > uc_id (PK)
> > uc
> >
> > KB (Knowledge Base)
> > ===================
> > entry_id (PK)
> > uc1_id
> > uc2_id
> > relation_type
> >
> >
> > It only needs a slight change, which I marked with <--- :
> >
> > UC (Universal Concept)
> > ======================
> > uc_id (PK)
> > uc
> >
> > KB (Knowledge Base)
> > ===================
> > entry_id (PK)
> > uc1_id (PK) <--- CHANGE to PK
> > uc2_id (PK) <--- CHANGE to PK
> > relation_type
>
> Make uc1_id and uc2_id part of the primary key? Why? Shouldn't they be
> foreign keys (notwithstanding the fact that I didn't indicate so in my
> original post)?

They are still foreign keys. They can be both FK and PK. You want them to be part of the PK so you can ensure that any particular combination is unique and not null. In fact, I made a mistake: drop entry_id as part of the PK of KB. In fact, remove it completely. It has no use at all. You really should use:

KB (Knowledge Base)



 relation_type (PK)
 uc1_id (PK)
 uc2_id (PK)

You do have another problem, though, that can't be enforced through normal data constraints. That is, for example:

If
relation_type = c
uc1_id = 5
uc2_id = 8

Is fine, but if that exists, you can still have

relation_type = c
uc1_id = 8
uc2_id = 5

which may or may not be correct. I suspect that _most_ of the time, it would be incorrect.

I think this can only be enforced through some procedural code (trigger, front end code, etc.), and even then only if the result was incorrect 100% of the time. For example, you may have data that would read correctly as

BRAZIL is a MOVIE

but

MOVIE is a BRAZIL would not be correct

Okay, another way to do this, and I know you don't want to (but it is the right way to do it, as I will attempt to explain below), is one table:

KB
uc1 (PK)
relation_type (PK)
uc2 (PK)

This is probably the correct way to do it. Actually, thinking about the rules for converting an ERD to a relational schema (this exact situation is not covered directly), and extrapolating the rules, you should wind up with one table, so this is the correct answer to your problem. The rule I extrapolated from is basically, create a seperate table for each entity and relationship in a M:N situation. That would always yield three tables. The PK of the relationship (connecting) table is made up of the PKs of both entities. Well, guess what happens when you do this- you wind up with three tables, but the two entities are redundant with the one that is made from the relationship.

The reason we got confused was because of the artificial keys. They are not "real" data. So, if you take those out, you have only one attribute remaining in the UC table(s). Another rule of ER modeling is that you can't have an Entity with only one attribute- it can always be subsumed into another entity or relationship. Ultimately, this is what I did, and the process we went through proves this rule to be true.

You still can't prevent "wrong" combinations, but that is true in almost any system. You often can't prevent the users from entering syntactically correct data that makes no business sense. For example, a user in my company's Human Resources Department may enter my salary as 25,000,000 dollars. The data constraints may allow that, and although I am worth it, it is not correct from a business perspective. That could easily be enforced with a CHECK constraint or trigger, but not everything can.

>
> > And actually, you don't really need entry_id- you could just use
> > relation_type, but I think you mentioned that originally, and your
reason
> > for using it is valid.
>
> "relation_type" merely expresses how the first and second UCs of the
> respective KB entry are associated to each other ("is a", "is part of",
> "is agent of", etc.). That only has meaning in the context of the
> application, not the database. This goes to say that the values for that
> field may and will often repeat across the records. I think I could
> actually have removed it from my original post.

If it has meaning in the application, that's fine, but don't you still need to store the relationship type? How will the application know, or is it not important? It had better be important, or I wasted a lot of time :)

>
> > Keep in mind that what we have here is not an entity-relation model
(ERD),
> > we have a relational schema (table design). The modeling on an ERD would
be
> > different. I can't draw it here, but basically, you have one rectangle
(UC)
> > and one diamond (KB) connected twice, once in the uc1 role, and once in
the
> > uc2 role, both with mandatory participation and "many" cardinality.
Further,
> > you have two attributes on the diamond, entry_id and relation_type.
>
> I will have to look up some documentation in order to take full
> advantage of your remarks here. But I do understand it when you say
> there is a difference between table design and ER modelling.

There are many notation types, but most use a rectangle to indicate an Entity, and a diamond to indicate a Relationship between Entities. They then map, through certain rules, to tables.

>
> Best regards,
>
> --
> Ney André de Mello Zunino
Received on Wed Oct 27 2004 - 02:18:12 CEST

Original text of this message