Re: Lucid statement of the MV vs RM position?

From: Christopher Browne <cbbrowne_at_acm.org>
Date: Thu, 20 Apr 2006 22:05:37 -0400
Message-ID: <87acafhedq.fsf_at_wolfe.cbbrowne.com>


In the last exciting episode, ralphbecket_at_gmail.com wrote:
> Pickie wrote:
>> There doesn't seem to be a formal model anywhere. Integrity
>> constraints are not enforced at the database level.
>
> Yikes.
>
>> http://www.pickwiki.com/cgi-bin/wiki.pl?PhilosophyOfPick gives
>> my views of the philosophy behind Pick.
>
> I've just read your article. Two misconceptions of the RM seem
> to crop up again and again in comparisons between MV and the RM:
>
> (1) the RM is *not* based on the idea of storing data in tables.
> Under the RM, a relation is a set of rows with the same
> signature, and a row is a partial function from column names to
> values. The signature of a row is the domain of the function
> (i.e., a set of column names). It is purely accidental that
> relations of this kind can be conveniently portrayed on the page
> as two- dimensional tables. The emphasis on the RM should be
> that a relation is a *set* of rows, a row is a *set* of (column
> name, value) pairs, and sets are unordered, duplicate-free
> collections.
>
> (2) The RM says *nothing* about *how* a database should be
> implemented. It would be a mistake to think that because
> relations are often called "tables" and are often portrayed as
> 2D arrays, that is how they are stored in memory or on disc.
> Any implementation taking that route would have shocking
> performance. The point of the RM is to separate the model (how
> one thinks of the data) from the representation (how it is
> stored). A good RDBMS implementation should make good decisions
> concerning representation (perhaps under the guidance of the
> DBA), but that is purely an optimization issue. Conflating
> representation and model is akin to hand optimizing a program
> before you'r sure it's correct: it will surely lead to a world
> of pain.

If you look at the ACM TODS (Transactions on Database Systems), a goodly number of the papers present views of relational systems in a fashion that looks *way* more like Prolog than anything else.

It tends to be pretty easy to represent relational facts as well as queries as sets of Prolog clauses.

There is a conspicuous disconnect from Darwen/Date, there, in that they trumpet loudly about strong data typing, whilst Prolog tends to be nearly type-free. Mind you, I'm conflating representation and model there, a bit...

>> There is a mindset about the Relational Model that is disturbing.
>> The point of view that says that there is no TRULY Relational DBMS
>> because of incompetance or wickedness on the part of the SQL DBMS
>> providers is just outright wrong.
>
> My gut feeling is that it's partly to do with poor early choices
> having become the standard and partly to do with the fact that not
> many people finish a CS degree with any understanding of theory or
> how the careful application of theory can save huge amounts of time
> and effort. Given things such as the lack of any decent type theory
> or the addition of terrible ideas like NULLs into SQL, I'm inclined
> to think the latter is more significant than the former.

I'd tend to agree with that.

One issue I'd take is with the notion of the *forcible* importance of type theory. That's certainly the sort of thing that falls out of a focus on type-oriented systems like ML, which extend the importance of explicity typing as is typical in the spectrum of computer languages like FORTRAN, PL/I, *descendents* of the typeless BCPL like C, C++, and Java, and Pascal descendents like Ada.

In contrast, there are also a lively set of languages that eschew strong typing, like, well, in the ancient past, BCPL, Tcl, and Perl. And lively sets of languages that have strong typing, but which mostly eschew type annotations, like Scheme, Common Lisp. I'm not certain how to classify Smalltalk...

At any rate, there's enough diversity there that I don't think I can go along with type theory being entirely essential...

>> The problem is that it is difficult in the extreme to build a data
>> store of whatever size desired, that can have some arbitrarily
>> huge number of people changing the data in it, and that will
>> provide the answer to any conceivable query - as if the data store
>> were to be frozen until the query is done.
>
> The DBMS has to
> - ensure data integrity
> - ensure data availability
> - protect against hardware failure
> - manage distribution
> - manage concurrent access
> - optimize *dynamically* for *multiple* applications.
>
> There is no way it makes sense to implement each of these aspects in
> every new application. Implementing any one of them well is a huge
> undertaking.
>
>> Every time you put an index in, or some other cute little wrinkle
>> to more cleverly do this, you are argueably de-normalising your
>> database. Well, you are storing data in multiple places, anyway.
>
> If you bugger up your model ("denormalise" it) you should expect
> trouble. A good DBMS should allow the DBA to suggest
> optimizations, but the DBMS should be responsible for
> implementing those optimizations, which should not affect the
> model in any way.

And storing index key values in multiple places is *not* an addition of "unnecessary redundancy."

>> The idea of having a horrendously complex physical
>> implementation - in order to provide the appearance of a clear
>> logical model - is uncomfortable to me. I question, not the
>> Relational Model, but whether implementing this aspect of it
>> in this way is worth the trouble.
>
> As someone else said, the same could be said of compilers for
> high level languages. But as I said above, there are things
> that you Just Have to Have in a DBMS, and it's better to get
> them right just once, in one place: the DBMS, not every
> application.

Indeed.

-- 
output = ("cbbrowne" "_at_" "gmail.com")
http://linuxdatabases.info/info/spreadsheets.html
"Avoid the Gates of Hell.  Use Linux" -- Unknown source
Received on Fri Apr 21 2006 - 04:05:37 CEST

Original text of this message