abstraction of table relationships
Date: 6 Sep 2006 21:57:52 -0700
Message-ID: <1157605072.761613.23760_at_p79g2000cwp.googlegroups.com>
Every once in a while I come across someone who has an idea for how to "improve" relational queries in various ways. Sometimes they have the idea that there is some value to abbreviating or even omitting the relationships between tables to make queries shorter. I just want to go on record as to why this is 1) unnecessary and 2) harmful.
The person I spoke with today proposed that one fault of SQL is that common patterns, particularly in the area of cardinality relationships between table rows, have no abstraction mechanism. That is, we have common patterns in which we express cardinality relationships via structural means within the schema. A one-to-many relationship between table A and table B is expressed as a foreign key from B to A. A many-to-many relationship is expressed as a separate "join table" with foreign keys to table A and to table B. Etc. Learning these simple patterns is an important part of learning how to take good advantage of SQL.
This person went on to say that it would be a good thing if we
had some way to abstract these patterns and then refer to
the abstractions instead of having to repeat the full pattern.
I actually agree with this part, and in some other thread I
would like to discuss abstraction mechanisms for use within
schemas. However, what I wish to discuss here is his
further point. He gave the example of Books and Authors.
If they have a many-to-many relationship, he said it would
be better if one could query all books by a given author
in a simpler way, and have the system find the relationship
automatically. So you could, say, simply:
SELECT * from Books where AuthorId = 'Jane Austen';
even if Author weren't a field of Books. The system
would look at the schema for Books, and locate the
"path" in the system whereby it could uniquely attribute
an Author.
First of all, this idea is misguided on the basis of the fact that these "paths" are logical relationships.
But there is a much worse problem with modularity. I rewrote his schema such that there are tables Books and People, and a many-to-many table Authorship. Consider the query analogous to the above, where we query all books by selecting a primary key value in the People table. The "path" between Books and People is unique, so the system can easily fill it in. Fine. But now I come along later and want to record the various people who worked on the book in the role of editor, so I add a many-to-many table for Editorship. And voila! The earlier authorship query will *no longer work*
because the system cannot *uniquely* identify a"path" between People and Books.
In fact, the better solution to this problem is making better use of natural join. Schema design can and should be done with natural join in mind. Specifically, if there is any kind of key relationship between two tables, then the keys should have the same names, and the table should not have other, non-relating attribute names in common. This is actually quite trivial to do. (It's not possible in *every* case but it is certainly possible in a strong majority.)
Marshall Received on Thu Sep 07 2006 - 06:57:52 CEST