Re: foundations of relational theory?

From: Marshall Spight <mspight_at_dnai.com>
Date: Sat, 25 Oct 2003 06:21:32 GMT
Message-ID: <MDomb.23527$HS4.91636_at_attbi_s01>


"Dawn M. Wolthuis" <dwolt_at_iserv.net> wrote in message news:6db906b2.0310221139.207ddeb0_at_posting.google.com...
>
> My reason for suggesting that an RDBMS with added relations within
> relations would be less agile is due to the whole issue of how/where
> one encodes constraints.

One thing this thread has gotten me thinking about is the agility question.

> I actually think we agree on much of this, but with me coming down to
> using a language like Java for all typing/constraints and you having a
> not-yet-established or proprietary language in mind.

A declarative language would be better than a procedural one such as Java. Java does have an advantage in a large installed base, though.

> It sounds like we both have as a tactic to get the entire "system"
> (all applications) in a single language.

I'm against that idea. It strikes me as requirement for marketplace acceptance, if nothing else, that it be possible to write applications in any of various languages.

Paul has some ideas that I don't think I understand about applications being unnecessary, or something.

> I'd be more content to have my experience and theory align and one
> tact I'm taking with that is to examine the relational theory and see
> where there are holes. The biggest one I've found is the statement

> about how we wouldn't want to make the mathematics of data persistence
> less simple than relations -- that is a religious statement, not a
> mathematical one, so that is what I'm tackling first.

I don't see it as a religious statement; I see it more as a question of wanting to be able to express any data structure, and be able to query, manipulate, and constrain it as easily as possible.

You want to be able to query it so you can ask questions of your data, and just to simply retrieve it. You want to be able to manipulate it so you can make changes. You want to be able to constrain it so you can make guarantees about data quality, and enforce semantics.

So what do you see as the possible choices for universal data formats? Because this is certainly an objective question.

You can use heterogeneous lists (the Lisp techinque.) Manipulation is easy. Querying is hard; much information is context sensitive. (Witness the complity of XQuery.) Constraining is also hard; you have no foundation on which to build constraints; you have to write procedural code. You also must introduce a special way for list items to refer to information in other lists. That means either pointers or some kind of references, so now you no longer have lists that just contain data; they have to also contain special structural information to make up for the limited structure possible with just lists.

You can use maps. You have similar problems as you have with lists. (Note that representing maps with relations is trivial.) Consider how you would represent a data sturcture with two different unique keys with maps. You would have redundancy and no way to control it.

You can use relations. Querying is easy; you can use the relational algebra. Manipulation is also easy with the relational algebra. Constraints can be written declaratively and are context-free. You don't need pointers or references; you have the relational algebra. The math behind it is quite easy; they teach set theory in what? Sixth grade? Basic set theory + set membership as tuples + natural join and you've got pretty much everything you need. (Yes, I'm skipping some things, but really, the math isn't hard at all.)

You can use trees. You have one special way of encoding relationships: nesting. But this is not sufficient: it cannot express a many-to-many relationship. So you have to add something besides nesting: you need pointers or references. So now you have two different ways of encoding relationships, which is additional complexity. Querying is highly context-sensitive. There is no framework (that I'm aware of) for specifying constraints. Well, DTDs, maybe? Ugh.

You can use graphs. Graphs can represent any data structure natively. I have read that network theory provides a strong theoretical foundation for graphs, but that it is quite complex. Also, querying necessarily requires following a lot of edges.

Relations look like the clear winner to me.

Critiques welcome.

Marshall Received on Sat Oct 25 2003 - 08:21:32 CEST

Original text of this message