A Database Design Issue

From: Stelios Sfakianakis <ssfak_at_ics.forth.gr>
Date: Sat, 21 Jul 2001 23:31:17 GMT
Message-ID: <9gkle6$26hc$1_at_ulysses.noc.ntua.gr>


Hello everyone,
I want to represent in my database an "equivalence" concept for some entity E. For example this entity could be DNS names for various server machines in the internet and, since some of them are respresented by two or more names, I want to have this notion of equivalence depicted by one (or more) tables in my relational database.

My first attempt was the following:
Lets say that each row in E has a primary key called 'e_id' (entity id). Then i have the EQUIV relation with two attributes, 'e_id1' and 'e_id2', with the semantics that the entities in the table E with ids e_id1 and e_id2 are equivalent.
But soon i realized that i have some problems with this design caused by this "equivalent" concept of my entities and its properties:

  1. if (id1, id2) is contained in EQUIV, then also (id2, id1) must be contained in EQUIV and
  2. if both (id1, id2) and (id2, id3) are contained in EQUIV then also (id1, id3) must be contained in EQUIV Apparently based on these observations there should be a lot of redundancy in the EQUIV relation. [ It is true that I could somehow handle the "permutational" property (1) if, for example, i impose the contraint that e_id1 is greater than e_id2. But this complicates the queries that ask for the equivalents of some e_id because I should search EQUIV for rows that contain e_id either in the first or in the second column and retrieve the value in the other column.]

I thought that I'm having some well known normalization problem so I check Date's "Introduction in Relational Databases" (I have the 5th Edition, ok it's somewhat old :-) and the discussion it has about normal forms but I don't think that the problem I have encountered could be handled by any of these normal forms: the table EQUIV is a binary relation, so it's in 1NF, 2NF, 3NF and Boyce/Codd NF whereas 4NF and 5NF are not applicable in this case (Mr Date has an exercise about normalization and binary relations that I used for these results)

So what's happening here? Do I miss something? Is there any NF that I 'm not aware of and I have encountered a case of it? [I would answer no, but you never know...]

When I realized these problems I tried to find a different way to express this "equivalence" concept and i finally came up with a slightly different solution: the EQUIV table remains the same but the first attribute is not
(necessarilly) an e_id but an id of a *group* of equivalent e_ids. So if we
know that e_id1=10 and e_id2=11 are equivalent then we should have:  EQUIV



... | ....
3 | 10
... | ...
3 | 11
... | ...

i.e. the first column has the same value (here 3) for the two e-ids 10 and 11. The value of the first column could be assigned following any algorithm that restricts equivalent e_ids to have the same value for the first attribute. A simple algortihm is that whenever a new pair of equivalent ids
(e_id1, e_id2) is found, we check to see if any of these two ids exist
already in the EQUIV. If this is the case we insert the id not present in EQUIV with an 'equivalent group id' equal to the one that has the other
e_id. If neither of the two exist already , then we are creating a new group with an id that is equal to either of the two e_ids and with this new group id we insert two rows for each one of the e_ids.

Do you see any flaws in this second approach? Any thoughts/suggestions are very welcome!

Thanks for yor time
Stelios Sfakianakis Received on Sun Jul 22 2001 - 01:31:17 CEST

Original text of this message