The Theoretical Foundations of the Relational Model
Date: 14 Jun 2002 16:42:43 -0700
Message-ID: <57da7b56.0206141542.4a694b5d_at_posting.google.com>
In the midst of the current flare-up in the long-running 'objects vs
The origins of the 'relational model' lie in mathematical philosophy, and
specifically in something known variously as 'predicate logic' or'symbolic logic'. (I'll use the later term.) If you want to read more about
all of this I'd recommend the three books at the end of this post.
Symbolic logic was an attempt to put a set of framing principles around reasoned (ie. logical or rational) discourse. (ie. how to tell when someone is talking crap, even if you can't check their facts.) For the longest time there was this way of reasoning called 'sylogistic logic', and it has the form you're probably all familiar with. (P1) All men are mortal. (P2) Socrates is a man. (P3) Therefore, Socrates is mortal. Lots of people talked this way and we all 'knew' how the rules (things like '(or (a) (not a)) is always true, and that from (P1) and (P4) Kate Moss is not a man, it does *not* follow that (P5) Therefore, Kate Moss is not mortal.) worked, but no one had paid a lot of attention to the topic of logical structure for a while.
Symbolic logic was the result of a renewed focus on thinking about how to think. One of the earliest thunkers about this stuff was a guy called George Boole, who said in the introduction to his book that what he was trying to do was to think about logic (about thinking) from a mathematical point of view, rather than to try to establish the laws of logic from our acquaintance with reality. At the time (late 19th century) is was becoming clear that a lot of what human beings concluded about the world from observation was plumb wrong. What the logical philosophers did was to try to see if this dissonance was the consequence of poor habits of thought. The whole exercise culminated in an attempt to place all of mathematics upon a foundation of logic (until one M. Goedel put the kybosh on that! But I digress. . .)
In his book _Introduction_to_Symbolic_Logic_and_its_Applications_ Rudolf Carnap highlights the way that one of the differences between syllogistic and symbolic logic in the later's emphasis on the relation. The idea is that we are generally interested in reasoning about 'propositions': true sentences describing the relationship between things (nouns) and/or their properties. A sentence like 'There exists a planet called Earth which is 12,756 km in diameter and has a mass of 5.98 x 10E+24 kg.' is a proposition. So is "There is a Product called 'A Toothbrush', that sells for $7.49, is red, is plastic, and costs $5.75 to make.".
Now, an important point to note is that there are infinitely many possible propositions. (Even if you were to AND together everything in existance, you could still say 'AND there exists a proposition of the following form' and repeat yourself.) In practice what we are doing is not representing reality in a schema, but imposing an order on the world which does not really exist there. However, if you look at any pair (or set) of propositions there are rules to follow about manipulating them. For example, if the first sentence above is true, you can deduce from it that 'There exists something with a mass of 5.98 x 10E+24 kg.', and that (for example) there is at least one proposition in our universe of discourse. (oops - now there are two, now there are three . . . )
[some time later]
When we point to a group of propositions with an identical structure (refering to the same kinds of things in the same relationship to one another) we label that group a 'relation'. It doesn't really matter what order they appear in, or what order the elements appear in. An example of a relation would be all of the sentences about planets (let's just stick to the solar system for the time being, and presume that Pluto is the last one.) We call the different 'kinds of things' in the relation's elements instances of 'domains'. There is a domain of Planet Names, a domain of Weights and Distances and so on. If you want an object-speak term, the closest thing to a relation in object-land is a 'pattern'.
Now, in pure mathematical logic, there are *only* relations and domains. For example, there is a relation like this: Equal { <'Earth', 'Earth'>, <'Mars','Mars'>, etc } and another like this: LessThan { <0.1, 0.0>, <0.2, 0.1> . . }. The first relation is finite (because the domain of Planet Names is finite) but the last one is not. It arranges every possible Mass value, and every possible Mass less than that it (even if there is nothing at all that actually weighs that amount), into a vast list of pairs. Not very practical for a database (oops) but incredibly powerful conceptually.
Why? Because we can reason about the propositions contained
within relations in an orderly, deterministic fashion. In other words, we
can automate reason (can't say anything about the correctness of the
propositions themselves, but we can say quite a lot about how they can be
manipulated). The whole sum of Ted Codd's great insight is that all of
the programming language stuff about 'references' and 'identity' and
'order' can (and should) be eliminated without losing any representational
power. The principles and practices that find expression in 'the relational
model' are not really about programming at all. They are an attempt to
describe a model of rational thought that can be written into a computer
program (a DBMS).
We can take a proposition of the following form 'Those Planet Names where mass of the planet is less than the mass of a planet called 'Earth', which does not really exist in the same sense that our Planet relation does, and turn it into something like this:
WITH Planet as P1, P2 [ P2.Name, P2.Mass ]: (Equal < P1.Name, 'Earth' > ), (LessThan < P2.Mass, P1.Mass >);
And this helps to explain why Relational people are so hostile to
'object speak', or 'references', or physical programming in any of its
manifestations. The R model is a delight precisely because it rids us of
loosey-goosey notions like 'class' and 'object' and replaces them with a
systematic, and very powerful, way of thinking about the problem. It
corresponds to something like the rules of chess without which no game
of chess could exist, whereas object speak feels like a bag of
chess pieces and a board (which are utterly superflous). There is no way,
in an object-base, of saying whether one schema design is better than
another, or why. In relation-land, we can look at a schema design and the
rules it follows and make formal judgements about the design. And we can
take query expressions and turn them into a sequence of automated steps
which will compute the answer (we can even select one sequence out of the
manifold possibilities because it is the 'cheapest' way to compute the
answer.)
Anyway, I hope this clarifies the issues somewhat. This kind of back-and-forth gets really old, really fast, because it usually all boils down to the object camp not understanding where the relational guys are coming from. We're all very familiar with object-speak because we are, mostly, programmers. But James (for example) repeatedly indicates that he has no clue about what the hell the R guys are talking about. (BTW: Don't get too cranky about this, James. You're not the first and I doubt you'll be the last.)
Relational theory people are only really interested in computers to the extent that they can use them as tools to automate their model (and it needs to be pointed out that the only kinds of models you can automate are those with a firm theoretical foundation.)
Books:
Carnap, Rudolph. _Introduction_to_Symbolic_Logic_and_its_Applications_. Russel, Bertrand. _Introduction_to_Mathematical_Philosophy_. Langer, Susanne. _An_Introduction_to_Symbolic_Logic_ (This is the best.)Received on Sat Jun 15 2002 - 01:42:43 CEST