Re: A simple situation shows confusion about basic concepts
Date: Wed, 27 Oct 2004 22:52:20 GMT
Message-ID: <EEVfd.3837$Xq3.1172_at_trndny01>
"Ney André de Mello Zunino" <zunino_at_inf.ufsc.br> wrote in message
news:2uaifgF25s033U1_at_uni-berlin.de...
> Alan wrote:
>
> > "Ney André de Mello Zunino" <zunino_at_inf.ufsc.br> wrote in message
> >
> >>Make uc1_id and uc2_id part of the primary key? Why? Shouldn't they be
> >>foreign keys (notwithstanding the fact that I didn't indicate so in my
> >>original post)?
> >
> > They are still foreign keys. They can be both FK and PK. You want them
to be
> > part of the PK so you can ensure that any particular combination is
unique
> > and not null. In fact, I made a mistake: drop entry_id as part of the PK
of
> > KB. In fact, remove it completely. It has no use at all. You really
should
> > use:
> >
> > KB (Knowledge Base)
> > ===================
> > relation_type (PK)
> > uc1_id (PK)
> > uc2_id (PK)
>
> All right. That makes sense. I dropped the /entry_id/ PK and made the UC
> ids part of the primary key as well. This is what I have got:
>
> KB (Knowledge Base)
> ===================
> uc1_id (PFK)
> uc2_id (PFK)
> relation_type (PK)
>
> > You do have another problem, though, that can't be enforced through
normal
> > data constraints. That is, for example:
> >
> > If
> > relation_type = c
> > uc1_id = 5
> > uc2_id = 8
> >
> > Is fine, but if that exists, you can still have
> >
> > relation_type = c
> > uc1_id = 8
> > uc2_id = 5
> >
> > which may or may not be correct. I suspect that _most_ of the time, it
would
> > be incorrect.
> >
> > I think this can only be enforced through some procedural code (trigger,
> > front end code, etc.), and even then only if the result was incorrect
100%
> > of the time. For example, you may have data that would read correctly as
> >
> > BRAZIL is a MOVIE
> >
> > but
> >
> > MOVIE is a BRAZIL would not be correct
>
> I agree. This kind of constraint will likely end up implemented in the
> business logic of the application, especially because there are quite a
> few of them and some can't be easily done in the relational model.
Good to hear, but remember, you can make it "idiot" proof, but not _fool_ proof :)
>
> > Okay, another way to do this, and I know you don't want to (but it is
the
> > right way to do it, as I will attempt to explain below), is one table:
> >
> > KB
> > uc1 (PK)
> > relation_type (PK)
> > uc2 (PK)
> >
> > This is probably the correct way to do it. Actually, thinking about the
> > rules for converting an ERD to a relational schema (this exact situation
is
> > not covered directly), and extrapolating the rules, you should wind up
with
> > one table, so this is the correct answer to your problem. The rule I
> > extrapolated from is basically, create a seperate table for each entity
and
> > relationship in a M:N situation. That would always yield three tables.
The
> > PK of the relationship (connecting) table is made up of the PKs of both
> > entities. Well, guess what happens when you do this- you wind up with
three
> > tables, but the two entities are redundant with the one that is made
from
> > the relationship.
> >
> > The reason we got confused was because of the artificial keys. They are
not
> > "real" data. So, if you take those out, you have only one attribute
> > remaining in the UC table(s). Another rule of ER modeling is that you
can't
> > have an Entity with only one attribute- it can always be subsumed into
> > another entity or relationship. Ultimately, this is what I did, and the
> > process we went through proves this rule to be true.
>
> I understand that rule. And, yes, the UC entity might as well have a
> single attribute: the UC string itself. But what do you do when you know
> that entity will be used in several other places of your relational
> model and you don't want to use the full UC string as the foreign key in
> those situations. One of the reasons being that those strings may be
> large and cumbersome. Here is an example of an average real UC:
> 'explain(icl>express(agt>thing,gol>person,obj>thing))' Wouldn't it be
> acceptable to use a plain integer PK in a case like that? I.e., to keep
> up with:
>
> UC (Universal Concept)
> ======================
> uc_id (PK)
> uc
>
> [...]
You could, but these days, with storage being cheap and processing being fast, I don't see a compelling reason to not use the "natural" key approachespecially with this sort of obtuse data. Less chance of error, IMO. As you may suspect, I favor using natural keys whenever possible. I only use artificial keys when there is no other choice, except if I know there will be a very high usage of low cardinality keys. An example being what I call "lookup" values, such as State here in the U.S. I will store "PA" instead of "Pennsylvania" in an address table. Not the greatest example, as PA is also a natural key. Okay, here's one- I'll store STL to indicate our St. Louis office, PIT for Pittsburgh, etc. in our sales tracking program. Low cardinality, high usage. In our HR system, we store department numbers rather than names- another example of low cardinality and high usage.
>
> >>"relation_type" merely expresses how the first and second UCs of the
> >>respective KB entry are associated to each other ("is a", "is part of",
> >>"is agent of", etc.). That only has meaning in the context of the
> >>application, not the database. This goes to say that the values for that
> >>field may and will often repeat across the records. I think I could
> >>actually have removed it from my original post.
> >
> > If it has meaning in the application, that's fine, but don't you still
need
> > to store the relationship type? How will the application know, or is it
not
> > important? It had better be important, or I wasted a lot of time :)
>
> Don't worry. They are indeed important and your time was definitely not
> wasted. I guess I made my point too strongly about telling the
> /relation_type/ attribute from database relationships.
:)
>
> > There are many notation types, but most use a rectangle to indicate an
> > Entity, and a diamond to indicate a Relationship between Entities. They
then
> > map, through certain rules, to tables.
>
> Thanks for further clarifying that. Things are going to fall into place,
> gradually. What happened to me here is that, after being away from
> database design for some considerable time, I was faced with this, not
> necessarily complex, but non-trivial modelling problem. But with the
> help of you fellows, I will get there.
>
> Thanks a lot again,
Good luck to you!
>
> --
> Ney André de Mello Zunino
Received on Thu Oct 28 2004 - 00:52:20 CEST