Re: In an RDBMS, what does "Data" mean?

From: Paul <paul_at_test.com>
Date: Fri, 28 May 2004 15:36:46 +0100
Message-ID: <40b74ea7$0$1049$ed2619ec_at_ptn-nntp-reader02.plus.net>


Dawn M. Wolthuis wrote:

>> Newtonian Dynamics assumes certain axioms, which we now know to be 
>> slightly wrong.

>
> If talking about mathematical axioms, they are not right or wrong --
> they just are. It is the use of those axioms in some setting or
> another that could be inappropriate, not useful, or lead one to draw
> incorrect conclusions due to applying a poor mathematical analogy
> (metaphor) to the situation.

Well, OK, when I say the axioms are wrong I mean that the axioms don't quite give a theory on which we can base an accurate model of reality. (Though they may be good enough for an approximate model of reality).

> So the mathematics is right, but the science is wrong -- and I think
> that is a major point of this thread.

My point is that the DBMS is only concerned the mathematical part, and theory proves that it does it perfectly. The science part is beyond the scope of the DBMS - making sure that is OK is up to the database users.

>> I'm just talking about the system of logic that enables us to talk
>>  about our database (our "theory" if you like). Whether our theory
>>  has axioms that correspond to the real world, or whether our 
>> interpretation (or "model") of our theory is accurate, is a totally
>>  different question.

>
> Exactly -- so I think you and Wol (and I) are in agreement on that.
> It is why whenever anyone suggests that the best way to set up a
> databases is by employing relational theory BECAUSE relational theory

> is based on mathematics, I laugh (then cry).

Why? This seems like a reasonable statement. Suppose for example we based our DBMS on second-order logic. Then theory tells us we will have incompleteness (ignoring the fact that databases are finite!). So this would tell us that the mathematical part of the DBMS is on shaky ground. As it happens that DBMSs use first-order logic, we know it is rock-solid because of Godel's Completeness Theorem. That seems very reassuring to me. Maybe this point seems so obvious that people just take it for granted - they don't even realise that there is something to be proved in the first place.

Now it may well be that the "multivalue" database model also just uses first-order logic presented in a slightly obfuscated way, in which case you'd have the peace of mind for that as well.

> I have an appreciation of what mathematics is and what it isn't. How
> do we determine whether a mathematical model is a good metaphor for
> what we are doing? We have to step outside of mathematics to do
> that. So, the proof that various aspects of relational theory have
> been good for use with DBMS's is not within mathematics.

The proof of the usefulness of the mathemtical part of DBMSs is definitely within mathematics. But as you say, deciding whether your model is a good metaphor for linking your database to reality is beyond the scope of both DBMSs and mathematics.

> [Slight digression: If we could the 1st-order predicate logic behind
> the "folder" metaphor (ah ha -- how 'bout a function?) we could make
> some progress perhaps?]

I think the problem here is that if you want trees you can't do it with first-order logic.

>> Given that your axioms and your interpretation are correct, then I
>>  think you can show the DBMS proof is true in real life (for the 
>> reasons given above and in previous posts).
 >
> And how do you show that your interpretation is correct -- by not
> showing it to be incorrect, by showing many cases where it is
> correct? I think that is central to this discussion. I'm about to
> read the book someone mentioned, "Data and Reality," and perhaps that
> will shed some more light on that question.

You can't; it's impossible. To show that your interpretation is correct we move away from mathematics into science. And in science you can never prove something, only disprove it. You just hypothesize that something is true and try to find a counterexample to show you were wrong.

> Summarizing -- three questions: 1) (How) can we prove that our
> mathematical model (e.g. relational theory) aligns with what we are
> applying it to (e.g. databases)? I think we can only disprove it or
> fail to disprove it.

Well we kind of go right to the very basis of everything: logic is by definition what we think of as truth, so it applies to everything. If p is true and q is true, then so is "p and q" true. We could build a DBMS around a logic where this isn't the case, but I don't think it would be very helpful! Alternatively we can go upwards to a more complex logic, but theory tells us this could cause incompleteness problems.

> 2) Are we missing some important aspects of databases (e.g. mountain
> man's concerns) if we limit ourselves to a single mathematical
> metaphor (e.g. to what relational theory can tell us, or can tell us
> today)?

I'm not quite sure what mountain man's point is. Is it that we should store things like constraints, view definitions, etc. in a relational format rather than as strings in some query language? I can see the appeal of this idea but I think how we store statements in our "meta-language" doesn't change the fact that our actual data is stored in relations. Or is it that we could store things like form layouts and application flow logic in tables - if so then I don't think this is a totally new idea, though maybe an interesting one to explore. MS Access had something like this built-in I think, created by a form wizard - "table-driven forms".

Either way I think this is orthogonal (excuse the buzzword!) to the central idea of relational database theory: to base things as closely as possible on first-order predicate logic.

> 3) Are we applying the best, most effective, most efficient, etc
> metaphor or is there something better to either supplement or replace
> it?

I think we are. I think the insight that Codd had was to start with logic and build upwards from there, instead of putting together an ad-hoc data model first and then trying to reconcile it downwards to logic.

I think the only ways we could go would be to different logics e.g. multi-valued logic or "fuzzy" logic etc. I don't claim to know what these all are but a search should bring up various weird and wonderful logics.

Or upwards to higher-order logic, although I don't know if incompleteness becomes an issue then. Maybe because we are always dealing with unbounded but finite systems it doesn't apply or something. I think if you go this route you end up with things like Datalog or Prolog.

Paul. Received on Fri May 28 2004 - 16:36:46 CEST

Original text of this message