Re: How to work with m:n relationships?

From: Jonathan Leffler <jleffler_at_earthlink.net>
Date: Mon, 20 Feb 2006 06:30:47 GMT
Message-ID: <rmdKf.2047$VI6.1184_at_newsread1.news.pas.earthlink.net>


vldm10 wrote:
> I will use the post which was on comp.database on 25 January 2006,
> “Design Query”, as one example for m:n relationships. Here we
> should to design a database to track product subscriptions. The
> solutions are simplified so that we can concentrate on m:n relationship
> and on some dis/adventages. Let following solution be as it is usually
> done in the relational databases.
>
> Subscriber ( ssn, name, address, phone, email, login, password,
> datefrom, dateto, other-details)

Remind me not to get subscriptions from you :-)

Do NOT use SSN in a DBMS if you can possibly help it, and there is approximately zero justification for doing so with a set of product subscriptions - they have nothing to do with government sponsored retirement schemes, nor anything to do with the IRS. [Internal Revenue Service in the USA - the governmental organization charged with collecting federal taxes.]

If you must store the SSN, do not use it as a key - generate a surrogate and use the surrogate as the key. If necessary, use an MD5 or better hash of the SSN - though since there are fewer than 1 billion SSNs (they are a 9-digit number), it is not too hard to precompute the possible values for the hash values, so it is not a very good idea to use a hash.

If you must store the SSN, do not store it unencrypted.

> Product ( product_code, name, price, other_details)
> SubscriberProduct ( ssn, product_code, type_of_subsription, date_from,
> date_to )
> Keys are ssn, product_code and (ssn, product_code) respectively.

Sometime, you'll explain why you don't permit a subscriber to resubscribe for a second period. There are probably reasons for this, but it isn't immediately obvious to my why.

> I will
> compare above solution with the following (which I defined in
> www.dbdesign.com):
>
> SubscriberSt (Key1, ssn, name, address, phone, email, login, password,
> datefrom, dateto, other_details)
> ProductSt (Key2, product_code, name, price, date_from, date_to,
> other_details)
> SubscriberProductSt ( Key3, Key1, Key2, type_of_subsription, date_from,
> date_to )
> Here every Key is one column generated key. This solution has suffix
> St and I will call it St-solution. So, Keys are only difference between
> these two solutions. We will assume that the entities and the
> relationships can change their own states.

Well, don't forget to declare the other candidate keys too - the original keys. You've got two keys per table - the new, single column ones and the old multi-column ones.

>
> 1.
> Let Subscriber and SubsriberSt now get values:
> Subscriber1 = ( s1, n1, a1, p1, e1, l1, ps1, df1, dt1)
> SubscriberSt1 = (k11, s1, n1, a1, p1, e1, l1, ps1, df1, dt1)
>
> Let now Subscriber1 gets a new phone number. We will need new key for
> Subscriber.
> Subscriber2 = ( s1, n1, a1, p2, e1, l1, ps1, df2, dt2) and the new key
> is (ssn, df)

New key? Why? The key is SSN (see your opening statement) and the SSN has not changed - so you are somewhat confused about what your example is supposed to be.

> SubscriberSt2 = (k12, s1, n1, a1, p1, e2, l1, ps1, df2, dt2)
> (St-solution can support this change)

Either there is no need for the change of key from k11 to k12 or you are attempting to do something that you've not stated you're trying to do, such as track the history of changes. If you're doing that, you need to look at Temporal Data and the Relational Model by Date, Darwen and Lorentzos [TDRM].

You're already off track - I wonder if I should really analyze the rest of the discussion.

> Now we can ask ourselves, what will be if any other attribute (for
> example email) change its value at same df2 time and has dateto not =
> dt2. Subscriber2 will get the duplicate key, while SubscriberSt2 will
> work normally.

This is a question for you to define - there is no inherent problem. If your dates have a granularity of 1 day, then multiple changes on the same day can be made, all with the same date from. I'm left wondering what date to is set to, and whether you ever changed the value in the previous record to the date when it became invalid.

> First solution uses time, as a key’s attribute, compound key etc.

Well, you didn't mention that in the definition - so, please start over, explaining what you're up to. But don't do that until you've read [TDRM] or some similar books.

> Is
> there an explanation which will say when to use compound key or when to
> use surrogate key. What are the meanings of these keys, etc?
> Can we say that time is property of an entity? And related with this,
> we can set another question. What does it mean that two entities are
> identical? Is it means that they are one same thing or they are the two
> things perfectly similar? In the second case their entity’s
> properties are same but their system’s properties can be different
> (like time/space/position properties). Here my main objection is
> regarding construction of a key. Can we use principle: if for n
> attribute’s values, two rows are same, then I can always find n+1
> attributes so that rows are different? Can I say that for any n
> attributes, I can always set two rows to be with same values?

These issues are dealt with thoroughly in [TDRM] and in 'Databases, Types and the Relational Model: The Third Manifesto, 3rd Edn by Date and Darwen [TTM], due out at the end of this week.

> 2.
> Now let St-solution be:
> SubscriberSt1 = (k11, s1, n1, a1, p1, e1, l1, ps1, df1, dt1)
> SubscriberSt2 = (k12, s1, n1, a1, p1, e2, l1, ps1, df2, dt2)
> ProductSt1 = (k21, p_c1, name1, price1, df3, dt3)
> SubscriberProductSt1 = (k31, k11, k21, type_of_sub1, df4, dt4)
> Here as addition to the referential integrity, it should be added that
> PK-FKs should
> correspond to each other by their states.

I don't understand what you're saying there.

> 3.
> Microsoft ADO.NET has possibility to work with the disconnected
> objects. We can use these objects to connect to a database, now we can
> get a data from the database, set them in the disconnected objects and
> then we can disconnect from the database. When we are disconnected we
> can update these data, for example we can update a key. Now we can
> connect to the database again and update it. Problems arise when there
> are the two users which are working in disconnected mode with the same
> row and one user updates the key, but the other one needs this key.

Yes - it is a bad idea to update keys. But you need to have proper definitions of your keys, and the properties you are hoping to model.

> There is Microsoft’s solution for these kinds of problems but it can
> be very complicated and there are many combinations. The St-solution
> which is shown above (in part 1 and 2) can handle these problems with
> ADO.NET easy. In fact St-solution doesn’t have updates at all. The
> difference is that ADO.NET solution is based on programming theory.
> St-solution is based on DB theory.
> ADO.NET is great step, but maybe DB theory can have more appropriate
> solutions related to programming languages.

The St-solution needs to update the old 'date to' columns unless your system is prescient and can predict at insert time when a given subscriber is going to change their address, phone number or whatever.

> 4.
> In his book “Writings 1994-1997” on page 554, C.Date writes about
> duplicate rows as well as E.F.Code’s opinion about this. C. Date
> writes that G.W.Leibniz’s The Principle of Identity of
> Indiscernibles is the basic why relations don’t contain duplicate
> members. But it seems that results from Quantum Mechanics show that
> Leibniz’s principle is false.

Any URLs you'd care to give as pointers to follow?

> As an important part of his philosophical theory G.W.Leibniz defined
> the law which is about relationship between entities and properties. In
> the notation of symbolic logic Leibniz’s law can be written as:
> x=y ↔ ∀F(Fx ↔ Fy)
> where x, y are objects and Fx is property of an object x.
> (For those who are interested in Leibniz’s Law, there is for example
> the Princeton University web site, where some ideas related to
> Leibniz’s Law are shown with the great imagination.
>
> http://www.princeton.edu/~jimpryor/courses/intro/notes/timetravel.html
> http://www.princeton.edu/~jimpryor/courses/intro/notes/personal-id.html
> )

Those may be helpful - do they cover the quantum mechanics?

> To not be misunderstood, I believe in Relational DB, but maybe some
> things can be “updated”, especially the things which are related to
> the new technologies. For example the things related to xml, web
> services, knowledge representation etc.

-- 
Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler_at_earthlink.net, jleffler_at_us.ibm.com
Guardian of DBD::Informix v2005.02 -- http://dbi.perl.org/
Received on Mon Feb 20 2006 - 07:30:47 CET

Original text of this message