Relational and MV (response to "foundations of relational theory")

From: Eric Kaun <ekaun_at_yahoo.com>
Date: Wed, 18 Feb 2004 02:05:23 GMT
Message-ID: <DLzYb.23126$ft2.1420_at_newssvr16.news.prodigy.com>

This is in reply to the �foundations of relational theory?� thread which began on September 26, 2003 and accumulated over 500 responses before petering out on November 7, 2003. Yes, I read them all, and was fascinated. Apologies - I found them only after the thread had run cold.

Dawn wrote: �[quoting Codd] 'Why focus on just one type of compound data? The main reason is that any additional types of compound data add complexity without adding power.' The last statement of these has no mathematical backing to it.�

Perhaps, but it at least appeals to the �logical aesthetic� of Occam (the philosopher, not the language): not to multiply entities beyond necessity. Hierarchies are more complex than relations, and yet express no additional information. Agreed that KISS is not a scientific argument, but it is a practical one: we're trying to build logical machines here, and simpler concepts are easier to mechanize.

There is of course an argument to be made for expressiveness; you can �do� logic using various minimal subsets of combinators (S, G, I, etc.) which I barely understand, but I would argue they're too simple to be expressive to humans (although people have written many programming languages using them, primarily - I suppose - as a learning exercise). You can manipulate relations using only 2 operators: remove and nor. This minimal set is relationally complete (as are many other sets), but difficult, so comprehensibility was a factor in the design of the usual relational operators: restrict, project, join, etc.

You focus on just one type of compound data because it gives you all you need, and walks a fine line between simplicity, expressiveness, and combinatorial power. You stray from it at the expense of one or more of those �attributes.�

Dawn wrote: �...why would mathematicians or scientists ever add complexity? Perhaps to have a better model?�

Mathematicians and scientists have different definitions of �better.� For scientists, better means more accurate at predicting the future (typically of physical reactions and events). For mathematicians and logicians, better typically means more amenable to symbolic manipulation, mechanization, and better able to describe formerly disparate models. For a data model, we want a minimal set as described above, but as long as we can define �higher-level� operators and types in terms of the minimal ones, there's referential transparency: simple substitution allows us to write expressions that are easier for us to understand, but still correct by virtue of being defined by base concepts that are opaque but simple to mechanize.

That said, hierarchies are neither higher-level nor more expressive for modeling data. They are more expressive in the limited context of a presentation to humans, in some cases � but then again, so are trees, tables, etc. In any event, those are all capable of generation from relations, but it must be understood that there are implicit decisions made (e.g. ordering, path) that are not present in relations.

Dawn wrote: �...many PICK/MultiValue applications have survived over 20 years � NOT JUST THE DATABASE, BUT THE ENTIRE APPLICATION!�

The same can be said of many technologies. The losers in a horse race still live � sometimes even when they lose repeatedly. It doesn't mean you should bet on them. Saying that companies sticking with MV is the norm refers primarily to any business's unwillingness to change (especially when they perceive no reason to).

Dawn wrote: �...constraints mean nothing outside of the context of an application anyway � what one DOES with the data is part of what defines it.�

The first part of this is completely wrong when you consider multiple applications which use the same database (or at least parts of it). This is the norm in companies in which I've worked. Yes, what one does with the data is part of what defines it; types and constraints in RDBMSs express what is and isn't allowed, yet also define what can be done through operators, both type-specific and relational.

Albert Kallal wrote: �...the fact is that most applications don't need much constraints anyway. The popular success of dBase years ago, and the popularity of MySql tday proves this...�

Popular success proves nothing except success (i.e. A => A). And all applications have constraints, many more than are typically placed in the database; just because you're writing them in code doesn't mean they don't exist.

Albert Kallal wrote: �...existing data AND CODE DID not have to be modified when a 'child data set' is added.�

So evidently that �data set� was so unimportant that it was only displayed � sounds like any CLOB or BLOB or string would work in that case. Why bother to nest values? Relational could define a dumb List type to do this anyway.

Anthony Youngman wrote: �...I'm not saying that SQL's abstraction is a bad thing (it does make ad-hoc queries easier), but by hiding the realities of implementation from programmers you are basically handicapping them such that they will NEVER be able to even compete successfully (let alone win) if one of the requirements is to stretch the hardware to its limits.�

I've never seen a requirement phrased like that, but why don't we still use assembly language, and raw file system manipulation? The entire history of computing, math, logic, and even science is based on abstraction; at what point do you think we went too far? Today, compiler optimization builds better generated code than most assembly programmers write most of the time, due to years of analysis and �best practices� made concrete in code. Shouldn't we let our DBMSs do the same? Date's book makes it clear that the relational model enables much optimization (based on logic!) that the flaws in SQL effectively disable.

Anthony Youngman wrote: �The reason MV is so powerful is that it's easy for us to mimic relational � it's a *subset* of what we have available to us.�

No, it's not. Since I can generate many �containment� hierarchies from a 3rd normal form database, I would argue the opposite. For example, from relations, I can show not only children �nested� inside parent �records,� but also the reverse. I suspect that just using the term �subset� isn't going to do either of us any good � perhaps a category theorist would pipe in with a proper definition of what we're each trying to say?

Dawn wrote: �MV/Pick originated by starting with the concept of querying data using spoken/written language. So, both XML and MV/Pick are language-based data models.�

Which language? English? I'd wager that a human language has different requirements, and a completely different development process, than a computer language. Unless you're referring to Vulcan, human language isn't based on logic, nor should it be (?). Nor vice versa. What's the value in using human language as a basis for data and queries?

Dawn wrote: �...there is NO PROOF AT ALL that an RDBMS yields higher developer productivity...�

Since there have been no RDBMSs until recently, you're right. Neither is there �proof� on the Pick/MV side, though there are some vociferous proponents.

Delusion wrote: �only value of the multivalued model is it's multivaluedness / the fact that it's so explicit / within the multivalued model data has more meaning�

Relations are explicit, and types and constraints make the meaning of your data far more explicit. MultiValues express, at best, one kind of hierarchy.

Anthony Youngman wrote: �Sound mathematics is worthless. After all, Euclidean Geometry is absolutely rock solid. So is Newtonian Mechanics. *Mathematically*, they are faultless. It's just that when you apply them to the real world, they break.�

That implies no one uses Euclidean Geometry or Newtonian Mechanics, which I think is a false statement � any physical engineers care to pipe in? It also implies that the same mathematics doesn't underlie quantum physics and relativity; at worst, Euclidean and Newtonian use the same mathematics limited by context, within a certain tolerance.

In any event, comparing the mathematics and its correlation to the real world with the relational model's correlation to reality is a misplaced one; better to compare a given data model (for a specific business) with the real-world external predicate it's meant to represent.

Mike Preece wrote: �It [Pick/MV] works when you work at it!�

So does practically anything. It does sound, however, that the Pick environment(s) offer useful application-building tools, despite a limited data model. Probably the Pick development is an excellent implementation of the �Pick data model,� whereas our current SQL implementations are pathetic implementations of something only vaguely pretending to have once been relational. Note, however, that however flawed SQL is (and it was even worse in the 80s!), it displaced previous hierarchical and network data models very quickly, (possibly) indicating a great degree of dissatisfaction with those. We should remember that history, lest we be condemned to repeat it.

Tony Gravagno wrote: �This is why Relational is being described as a religion in this thread, and relational theory as dogma. Yes, dogma is the correct word because the authority that hands down this misinformation is the educational system that creates misguided teachers out of misguided students.�

Strange, then, that the educational system is becoming more of a trade school, teaching not relational but SQL, and even specific implementations of SQL. Relational at least has a logical and mathematical foundation worthy of study. Since the Pick theory papers seem to have fallen behind a filing cabinet somewhere, there's not much to study.

Cmurthi wrote: �Aside: Knuth's 'fundamental algorithms' note from '68 or so, where he says proudly (imsr) 'none of these algorithms have actually been tested on a computer,' as if that made 'em superior to those that were programmed... does show the mind-set, what?�

Yes, it shows that logic and mathematical analysis have some place in designing logical machines (programs). Knuth's pride might just be based on the quality of his thoughts, and the logical strengths of the algorithms. It further reiterates the difference between levels of abstraction, and the design / implementation divide. It's not hard to fathom that given an implementation (e.g. a processor that represents numbers in a certain way, and manipulates words of a given size at a given speed), you can compare two algorithms based on what they do and the mapping of their elements to constructs within the processor. You don't need to code every possible algorithm to know something about the results; otherwise, how did any mathematics ever develop?

Anthony Youngman wrote: �Don't forget, Pick doesn't have an optimizer because it doesn't need one. The stats I've seen � given a known key you can retrieve ANY record from the disk with on average 1.05 head movements. Bearing in mind the typical Pick record is equal to several relational rows, that's a hell of an advantage if your system is i/o bound.�

Don't forget that Donald Knuth (whose specialization was algorithmic optimization) quoted C.A.R. Hoare (one of the pioneers of axiomatization in software) in writing: �Premature optimization is the root of all evil [in programming].� Forcing one particular type of premature optimization at the language or DBMS level is undoubtedly evil; even some pseudo-relational SQL DBMSs have clustering, which allows the DBMS to physically do exactly what you're describing with NO impact on database design or program logic, and which furthermore can be later tailored to actual usage and data patterns. Once again, the Pick �model� is a poor �subset� of the relational one.

The access you describe is the simplest one of all; building a record given a primary key value. If Pick does that quickly, fine; that still says nothing about its utility as a model. Any system I've ever used has many patterns of data access, and I'd hate to optimize for the simplest path at the expense of the others. Especially prematurely. And especially when that same optimization can (and should) be done behind the scenes without impacting every program that relies on that data.

Again, why don't you develop in assembly language? Much closer to the metal (as if that were an advantage). Or at least use file system primitives. Memory management primitives too � I'm sure you can gain some speed advantages with custom paging.

Tony Gravagno wrote: �In a relational database all columns _MUST_ be pre-defined by type. In a Pick database, dictionaries are used to describe views of the data but the application programs which update the data maintain the �definition� of how the data is used, referential integrity, etc. ... though a well documented database will have a definition for every field in the dictionary, or documentation to explain how the app is using the fields.�

Though apparently you believe this lack of typing to be an advantage, the potentially catastrophic disadvantages of this are one of the problems relational was designed to address. How exactly do you feel more comfortable giving programmers instructions on how to �use� a field, as opposed to defining a type which restricts what you can insert into an attribute tuple, as well as the operations defined over it? Again, I wouldn't feel comfortable with someone putting �abcdefghi� into a SSN# field � though if that was a special value of some sort, a true RDBMS would allow me to define a domain that allows just that! YOU decide what the data means, and the RDBMS helps you make it stick!

Again: if the rules aren't in the database, then every application has to encode them separately. How is this an advantage?

Mark Brown wrote: �There are only three data file structures: sequential, indexed sequential, and random. Anything else is just packaging.�

There is only one file structure: a sequence of bytes (we're abstracting above magnetic fields and bits). There is only one programming language: the raw machine code (even assembly is an abstraction). Why don't you use those constructs? Once you've answered that, apply the same reasoning to what's being said here.

In software development, we consistently aim too low; the computer could do so much more work for us. But instead we feel we know better, can optimize code and data access better, etc. The history of compiler development shows that we're wrong.

Mark Brown wrote: �These tools are ALL the same: they are only as good as their users. Pick one that works for you and go with it.�

That's just patently untrue (all tools are not the same � that's a fact in any field of endeavor), and would have us all content to keep using what we know and never anything more. Again as in any field, we need to evaluate tools based on other merits than �well, it's worked for me so far!� And a business has to consider support costs, maintenance costs, training costs and economies of scale in selecting standard tools, rather than allowing every developer to use whatever he or she desires.

Bobj wrote: �You can implement a lot of things using a simple ISAM and a few subroutines (classes if you prefer) without carrying the baggage of a DB. And that's why we find the Pick model to work so well. We can use as much or as little of the DB as we care to.�

Chris Rock (a comedian) said (paraphrasing): �You can drive a truck with your feet if you want to � that doesn't make it a good idea.� The same applies to many things (including software development). The Pick model, apparently, works because you can ignore it? And if you can ignore it, then the DB is essentially a file wrapper and says little about what's actually contained in it. Since the database represents the logical predicates of your application, �using as much or as little of the DB as we care to� essentially allows programmers (possibly on different applications or projects) to ignore whatever portions of the system's logic that they like!

Dawn wrote: �...persisting data based on language � such as modeling entire propositions together rather then piecing them apart to the extent done in an RDBMS � makes sense because we are not trying to persist mathematical relations ultimately � we are trying to persist propositions.�

I don't understand why persisting a �language proposition� (as if human language were a good model!) is better than persisting a logical one. Human language is a poor basis for mechanizing reasoning (unless maybe you're talking about Esperanto or Vulcan).

A proposition such as Pick claims to model better isn't an atomic one. Breaking compound propositions into atomic ones allows better mechanization, substitutability, optimization, etc. And even if you don't break them apart, at the very least you should avoid nesting data about which you want to make assertions � if you do, you're cementing a single (and limiting) path.

Mike Preece wrote: �[regarding cars and owners] There's an implied relationship and it's a parent-child, one-to-many relationship. [...] The database is required to record information about people, primarily, and the children they have, cars they own, and which car they usually drive.�

The implied parent-child, one-to-many relationship can be modeled in relational too, very easily. The fact that Pick would have you placing them in a single record may be several seconds less work than declaring a relational foreign key constraint. The flaw is in the word �primarily� - I've seen that assumption change many times in the course of a single project, and for any database worth its salt (one which truly models a piece of the business, not just some tiny segment), �primarily� is purely a matter of perspective.

More to come, in case you're having trouble sleeping. :-)

Eric

Received on Wed Feb 18 2004 - 03:05:23 CET

This message: [ Message body ]
Next message: Enu: "Re: Inheritance mapping in a RDBMS"
Previous message: Eric Kaun: "Relational and multivalue databases"
Next in thread: Tony: "Re: Relational and MV (response to "foundations of relational theory")"
Reply: Tony: "Re: Relational and MV (response to "foundations of relational theory")"
Reply: Eric Kaun: "Re: Relational and MV (response to "foundations of relational theory")"
Reply: Neo: "Re: Relational and MV (response to "foundations of relational theory")"
Reply: Eric Kaun: "Re: Relational and MV (response to "foundations of relational theory")"
Reply: Dawn M. Wolthuis: "Re: Relational and MV (response to "foundations of relational theory")"
Reply: Bob Badour: "Re: Relational and MV (response to "foundations of relational theory")"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message