Re: Lucid statement of the MV vs RM position?

From: Jay Dee <ais01479_at_aeneas.net>
Date: Fri, 21 Apr 2006 07:31:38 GMT
Message-ID: <uT%1g.4318$YI5.1539_at_tornado.ohiordc.rr.com>


Christopher Browne wrote:

> In the last exciting episode, ralphbecket_at_gmail.com wrote:
> 

>>Pickie wrote:
>>
>>>There doesn't seem to be a formal model anywhere. Integrity
>>>constraints are not enforced at the database level.
>>
>>Yikes.
>>
>>
>>>http://www.pickwiki.com/cgi-bin/wiki.pl?PhilosophyOfPick gives
>>>my views of the philosophy behind Pick.
>>
>>I've just read your article. Two misconceptions of the RM seem
>>to crop up again and again in comparisons between MV and the RM:
>>
>>(1) the RM is *not* based on the idea of storing data in tables.
>>Under the RM, a relation is a set of rows with the same
>>signature, and a row is a partial function from column names to
>>values. The signature of a row is the domain of the function
>>(i.e., a set of column names). It is purely accidental that
>>relations of this kind can be conveniently portrayed on the page
>>as two- dimensional tables. The emphasis on the RM should be
>>that a relation is a *set* of rows, a row is a *set* of (column
>>name, value) pairs, and sets are unordered, duplicate-free
>>collections.
>>
>>(2) The RM says *nothing* about *how* a database should be
>>implemented. It would be a mistake to think that because
>>relations are often called "tables" and are often portrayed as
>>2D arrays, that is how they are stored in memory or on disc.
>>Any implementation taking that route would have shocking
>>performance. The point of the RM is to separate the model (how
>>one thinks of the data) from the representation (how it is
>>stored). A good RDBMS implementation should make good decisions
>>concerning representation (perhaps under the guidance of the
>>DBA), but that is purely an optimization issue. Conflating
>>representation and model is akin to hand optimizing a program
>>before you'r sure it's correct: it will surely lead to a world
>>of pain.
> 
> 
> If you look at the ACM TODS (Transactions on Database Systems), a
> goodly number of the papers present views of relational systems in a
> fashion that looks *way* more like Prolog than anything else.
> 
> It tends to be pretty easy to represent relational facts as well as
> queries as sets of Prolog clauses.

True,

> There is a conspicuous disconnect from Darwen/Date, there, in that
> they trumpet loudly about strong data typing, whilst Prolog tends to
> be nearly type-free.  Mind you, I'm conflating representation and
> model there, a bit...

true,

> 

>>>There is a mindset about the Relational Model that is disturbing.
>>>The point of view that says that there is no TRULY Relational DBMS
>>>because of incompetance or wickedness on the part of the SQL DBMS
>>>providers is just outright wrong.
>>
>>My gut feeling is that it's partly to do with poor early choices
>>having become the standard and partly to do with the fact that not
>>many people finish a CS degree with any understanding of theory or
>>how the careful application of theory can save huge amounts of time
>>and effort. Given things such as the lack of any decent type theory
>>or the addition of terrible ideas like NULLs into SQL, I'm inclined
>>to think the latter is more significant than the former.
> 
> 
> I'd tend to agree with that.
> 
> One issue I'd take is with the notion of the *forcible* importance of
> type theory.  That's certainly the sort of thing that falls out of a
> focus on type-oriented systems like ML, which extend the importance of
> explicity typing as is typical in the spectrum of computer languages
> like FORTRAN, PL/I, *descendents* of the typeless BCPL like C, C++,
> and Java, and Pascal descendents like Ada.
> 
> In contrast, there are also a lively set of languages that eschew
> strong typing, like, well, in the ancient past, BCPL, Tcl, and Perl.
> And lively sets of languages that have strong typing, but which mostly
> eschew type annotations, like Scheme, Common Lisp.  I'm not certain
> how to classify Smalltalk...

true,

> At any rate, there's enough diversity there that I don't think I can > go along with type theory being entirely essential...

and true -- depending on your system's intended purpose.

[Here comes the rub.]

The "knowledge" DBMSs, which store facts and functional dependencies explicitly and independently, behave very differently than most "relational" DBMSs. I don't want to spend too much time on this point, but let me simply say that a KDBMS can be expected to expunge everything which is inconsistent with the most recently presented knowledge. Most DBMSs that posters to this group are familiar with would expect the system to reject such inconsistent data - usually because some constraint is violated.- rather than view it as better knowledge.

This, I feel, is an essential distinction between the two worlds -- and one that has parallels when the discussion moves to "strongly typed" v. "typeless" languages.

If languages are arranged along a continuum extending from "machine oriented" to "problem oriented," we should have little trouble recognizing that those on the machine oriented end have to be strongly typed and that those types must directly correlate to the hardware. On the other end: it depends -- and the decision involves a trade-off between flexibility and predictability. In the case of the in-between languages - like C++, for instance - it is entirely possible that you can't predict exactly which class will provide the methods that a pure virtual class needs to instantiate and operate on objects. As long as every thing's working well, everything works well. But if something "unexpected" crops up: all bets are off.

Date and Darwin concern themselves with systems which exhibit completely predictable behavior. The relational model they describe is completely silent with regard to "other" data types, other than prescribing that the system provide mechanisms for the user to describe, store, and operate on data of other types. What are those other types? Everything which isn't a truth value, a tuple value, or a relation value.

They described a set of operators which operate on those relational values with completely predictable results. Beyond that, they have described a system for other types of data and operations on those types which are also completely predictable.

But they are very careful to say that their type system is orthogonal to the relational model. I don't think it's correct to say that Date and Darwin advocate strong typing because of the relational model; I think they see their type system as an appropriate adjunct to the relational model and they deem it so because of the reliability their technique provides.

> 

>>> The problem is that it is difficult in the extreme to build a data
>>> store of whatever size desired, that can have some arbitrarily
>>> huge number of people changing the data in it, and that will
>>> provide the answer to any conceivable query - as if the data store
>>> were to be frozen until the query is done.
>>
>>The DBMS has to
>>- ensure data integrity
>>- ensure data availability
>>- protect against hardware failure
>>- manage distribution
>>- manage concurrent access
>>- optimize *dynamically* for *multiple* applications.
>>
>>There is no way it makes sense to implement each of these aspects in
>>every new application. Implementing any one of them well is a huge
>>undertaking.
>>
>>
>>> Every time you put an index in, or some other cute little wrinkle
>>> to more cleverly do this, you are argueably de-normalising your
>>> database. Well, you are storing data in multiple places, anyway.
>>
>>If you bugger up your model ("denormalise" it) you should expect
>>trouble. A good DBMS should allow the DBA to suggest
>>optimizations, but the DBMS should be responsible for
>>implementing those optimizations, which should not affect the
>>model in any way.
> 
> 
> And storing index key values in multiple places is *not* an addition
> of "unnecessary redundancy."
> 
> 

>>>The idea of having a horrendously complex physical
>>>implementation - in order to provide the appearance of a clear
>>>logical model - is uncomfortable to me. I question, not the
>>>Relational Model, but whether implementing this aspect of it
>>>in this way is worth the trouble.
>>
>>As someone else said, the same could be said of compilers for
>>high level languages. But as I said above, there are things
>>that you Just Have to Have in a DBMS, and it's better to get
>>them right just once, in one place: the DBMS, not every
>>application.
> 
> 
> Indeed.
Received on Fri Apr 21 2006 - 09:31:38 CEST

Original text of this message