Re: We're doomed

From: <sarabergman_at_hotmail.co.uk>
Date: Thu, 26 Feb 2009 08:46:15 -0800 (PST)
Message-ID: <7d8f3dc8-2eb7-420b-8071-b31adf02657b_at_j8g2000yql.googlegroups.com>


On 26 Feb, 15:43, paul c <toledobythe..._at_oohay.ac> wrote:
> Roy Hann wrote:
> > Tony D wrote:
>
> >> Oh well.
>
> >> Take a programmer and ask them what they think a relational database
> >> should be, you get MySQL.
>
> >> Ask a programmer what they actually want to use, you get this clunk.
>
> > Programmers want to write programs.  ...
>
> (1) ie. not read them?
>
> > ... The last thing they want is
> > software that could do away with 75% of them.  
> > ...
>
> (2) ie. do away with 75% of both programmers and their code?
>
> Solving (1) and (2) seem to me to be the most basic (ie. layman's)
> purposes in favour of relational approaches for making db apps.
>

The relational model is one of the great mathematical creations of the 20th century; Codd is widely and rightly recognised as somewhat of a genius.

The only problem with the relational model, however, is that it is a rigorous, precise mathematical and logical framework.

Logic and mathematics are all-or-nothing affairs. You can't have a mathematical proof that's just 78% logical, or 99.999% logical. It has to be 100% formally, precisely logical or else it's nothing.

The relational model is unashamedly based on the represntation of data as n-ary relations. If data is not represented as relations, it is not a part of the relational model. You can't represent 52% of the data as relations and the rest as something else.

Yet every one of the biggest-selling, professional "relational dbms" vendors have refused to use relations (or relvars) to represent their data: they use tables.

This is the Undermined Relational Foundation Fallacy: a table is not a relation.

This may sound like a quibble, given that a table looks a little like a relation and feels a little like a relation, but this quibble is of the utmost importance: it is simply illogical to build an entire mathematical framework upon the concept of a relation, only then, at the last minute, to substitute ad-hoc the concept of a table and expect all the logical conclusions based on relations still to hold.

This is mathematical nonsense.

And given that the relational model is one of rigorous, mathematical formality, this last-minute, ad hoc substitution of a table for a relation robs every "relational dbms" of its entire mathematical heritage. This is like planning the first womanned mission to Mars and spending years building a state-of-the-art interplanetary spaceship to take our heroic astronauts to the red planet, but on launch-day, wheeling out a propeller-driven bi-plane to the launch pad and claiming that all the plans laid for one are valid for the other. It doesn't matter that they share similarities, a space ship is not a biplane.

To make matters worse, a table itself is mathematically undefined.

And to make matters even worse, the consensual, casual, imprecise "definition" of a table claims that a table is an ordered collection of rows and columns, which fully admits duplicated rows. Yet both rowduplication  and ordering completely rule-out the ad hoc relation-table substitution, given that a relation by definition can have neither duplicated tuples nor ordered tuples. The very best a table can be is an ad hoc approximation of a relation, and a very poor one at that.

It seems that there could hardly have been a worse substitution.

This is especially a problem for those who approach the "relational dbms" from a mathematical point-of-view rather than from the point of a view of a potentially defective piece of software. It is all too easy for these people to view a table as a relation and to forget to enforce integrity and referential constraints, thinking them somehow bundled with the table itself. This causes companies millions of dollars per year in recovery efforts.

(Wikipedia gives the laughably imprecise, "A table is an accepted visual representation of a relation;" yet no one would dare suggest that the fundamental assumption of the relational model is that all data is represented as accepted visual representations of relations. Accepted visual representations of relations are simply not relations. In mathematics, this makes all the difference, no matter how much the average woman thinks they are the same thing. And besides: "accepted" by whom?)

So, given that a table is an ad-hoc, mathematically ill-defined approximation of a relation, have "relational dbms" vendors tried their best to minimise the damage of undermining the mathematical foundations of the relational model?

Actually, no.

The tables in most of the big-selling "relational dbmses" actually allow the row duplication that could easily have been prevented. Some of them even offer row-duplication as an optional extra when creating a table!

Some "relational dbmses" also offer the wonderful optional extra of allowing null-values in their tables, even though this completely contradicts Date's Information Principle; indeed Date has gone on record as saying that if you're using null-values then, whatever you think you're doing in a relation, you're not use relational algebra or relation calculus. Null has no place in the relational model and allowing null-values in the tables of "relational dbmses" just compounds the error of the initial abandonment of mathematical rigour.

Of course, there is no table algebra or table calculus rigorously defined to mirror the algebra and calculus of the relational model, not in any strictly formal sense. The only thing a join in an SQL command has in common with the relational model operator is the name: no mathematical precision can be inferred or proved. The Undermined Relational Foundation Fallacy is all-conquering.

(In fact, the most accurate description of a table as implemented by "relational dbms" vendors is: a pointer-indexed array of memory addresses.)

It may be argued that the abandonment of mathematical formality by "relational dbms" vendors and their embarrassing surrender to consequent imprecision is excusable if they could at least guarantee the other great alleged benefit of the "relational dbms," namely database integrity: the ability to guarantee that data are accurate, valid and consistent.

Alas, this is also false.

Take the widely acclaimed and wildly popular MySQL "relational dbms" for example. A quick glance at its fault history at http://bugs.mysql.com/ proves a thoroughly dispiriting exercise. The "relational dbms" has been crippled by a total of 5,814 S1-criticality faults during its serivce lifetime, S1 being the highest possible criticality of fault (the number of S2-criticality faults, the next highest, is currently over 10,000).

Most egregious of all are the non-deterministic S1 faults that crashed the dbms and destroyed the entire database integrity, without warning, leaving the reconstuction of the data where possible to various tools and scripts that might (or might not) happen to be at hand.

Bug ID 3105, for example, " ... marked tables wrongly [and] crashed when doing commands like delete."

Bug ID 38566, "When executing an insert/update workload with falcon_index_chill_threshold=4K and falcon_record_chill_threshold=4K, Falcon crashed." This particular bug was not even deemed serious enough to be corrected, despite being the highest criticality possible; it recieved the vapid answer, "Executing a light INSERT and UPDATE workload with falcon_index_chill_threshold set to 4K and falcon_record_chill_threshold set to to 4K, MySQL could crash." Marvellous.

Bug ID 42052 crashed a large data base doing a single UPDATE; the unreassuring answer to which from the MySQL team is, "To clarify, this is a problem with all large transactions in 6.4, whether it be updates, deletes, or inserts." So, that sorts that out.

(To be fair, the best thing a "relational dbms" can do on error is to crash immediately. It can be much worse when data is corrupted silently and undetected, only to manifest itself much later when the initial cause of the corruption is untraceably lost.)

The list goes on. 184 S1-critical transaction faults between 2003 and 2009: that's over 25 data integrity faults per year, or one every two weeks on average, any one of which could lead to catastrophic and unrecoverable data loss. And yet, though we can predict on average when a catastrophic failure will occur, they happen nondeterministically  and without warning, meaning that the "relational dbms" itself is crawling with non-deterministic behaviour just waiting to happen. Your UPDATE might work 1,000 times, or 10,000 times, but you can never be sure it will work properly on the 10,001st time. Every SQL operation has a non-zero probability of catastrophic failure. The best and only thing you can do is follow the Relational Damage-Limitation Standard (see below).

Nor are transaction and data integrity failures the only disastrous faults to which "relational dbmses" are addicted. Another area of vital importance is security: the guaranteeing that the data in the "relational dbms" is protected from outside interference and malicious misuse. We would hope that the great and multi-million dollar "relational dbms" vendors of the world would have impeccable security records.

Not so.

Let's take Oracle, a giant in the "relational dbms" arena, whose security fault history makes awfully depressing reading at: http://www.oracle.com/technology/deploy/security/alerts.htm

In April 2008, it lists 15 new security risks discovered for its flagship 10g "relational dbms." Just three months later, in July 2008, it lists 11 more security risks discovered. Three months later still, in October 2008, 15 more security risks were discovered. Just at the beginning of this year, in January 2009, 10 more security risks were discovered in 10g. There are Critical Patch Updates going all the way back to January 2005. And these are not just minor faults: some actually allow remote, malicious users to gain full access to the "relational dbms" completely unauthenticated: requiring neither user name nor password. There are few other areas of industry that would accept such sloppy and dangerously low standards as those enjoyed by these "relational dbms" contraptions.

And we haven't even reached Microsoft yet!

Microsoft has brought us the SQL Server "relational dbms." Receivers of its Service Pack 2 would surely have been shocked to see the list of over 130 faults discovered in the previous version they had been using, including the deeply worrying bug 917886 , "Error message after you roll back transactions on a table in SQL Server 2005: Error: 3315, Severity: 21, State: 1." Time to look for those ad hoc recovery scripts, it would seem. If roll back transactions cannot be trusted, what can?

Moving on, how about these for open and frank admissions of catastrophic data integrity failure. First bug 918757, "The value of a cell is not updated correctly when you use multiple UPDATE CUBE statements inside a single transaction in SQL Server 2005 Analysis Services." Bug 919775, "The BULK INSERT statement may not return any errors when you try to import data from a text file to a table by using the BULK INSERT statement in Microsoft SQL Server 2005." Bug 919955, "You may receive incorrect results when you run a Multidimensional Expressions (MDX) query that involves a calculated member and the autoexist functionality in SQL Server 2005 Analysis Services." And then the jaw-dropping 919957, "Some cells return the NULL value instead of returning the actual value when you query a dimension that contains a parent/child hierarchy in a SQL Server 2005 Analysis Services cube."

SQL Server also has faults that are, if anything, even more serious than database integrity and security faults: faults relating to the logging system, a point to which we'll return later. Faults include bug 40000100, "If the log reader agent and the DBCC DBREINDEX or ALTER INDEX ...REBUILD commands run at the same time, the log reader agent may miss some transactions to the distributor." And bug 40000102, "When you configure Log Shipping and add two or more secondary servers to Log Shipping, only the secondary server that you added last is monitored by the monitor server and the other secondary servers are not."

(We leave the last word to the Microsoft marketeers who wrote, "SQL Server 2008 Performance and Scale White Paper," in which they gleefully proclaim, "SQL Server 2008 provides a robust database engine that supports large relational databases and complex query processing." Yeah. Sure.)

Do not think, of course, that these three, perhaps the most popular "relational dbms" vendors on the market, are unique in having such core and inexcusable flaws: there just isn't space (or inclination) to present the lumbering disasters of any of the other, less popular "relational dbms" offerings. Just search the web for "list of bugs" and your favourite dbms: db2, postgreSQL, Berkeley DB, etc., and watch page after excruciating page of human misery scroll before your eyes.

In conclusion, the modern, "relational dbms," is a table-based dbms and given that a table is the ad-hoc, mathematically ill-defined approximation of relation, the table-based dbms has no mathematical foundation whatsoever and in particular, with its ordering, duplicate rows and null-values, cannot choose willy-nilly to claim that the conclusions of the relation model, or any other formal system, apply to it.

We have seen, furthermore, that the modern "relational dbms" fundamentally cannot guarantee the accuracy, validity or consistency of stored data. Thousands upon thousands of new and disastrously painful, costly faults continue to plague the "technology."

Of course, the introduction of a "relational dbms" to a software project is neither unprofessional nor is it criminal malpractice per se, but the decision should be taken with the utmost care.

So, what's to be done?

There is only one strategy that enables the introduction of a "relational dbms" into a software project. This strategy has been named the Relational Damage-Limitation Standard, and consists of the following three recommendations:

1.Expect catastrophic failure! Your "relational dbms" is going to fail and there's nothing you can do to prevent it. (Recall that MySql discovers a new, potentially integrity-destroying fault every two weeks.) All you can do is take out insurance: use a RAID system. Separate your database components across as many different disc resources as you can afford and duplicate the data as much as you can afford. Despite the, "Inexpensive," of the acronym, this can quickly become expensive, but the total corruption of data can be even more so.

2.Expect catastrophic failure! Your "relational dbms" is going to fail and there's nothing you can do to prevent it. Make back-ups of your entire database as often as you can afford. Your data will be an order of magniture more secure when it's stored on a file system somewhere that your defective "relational dbms" can't reach it.

3.Expect catastrophic failure! Your "relational dbms" is going to fail and there's nothing you can do to prevent it. Log everything! When your data is corrupt, at least analysing the logs might show you what caused the fault. (This is why the Microsoft faults relating to logging reported above were so worrying: if the logs fail, then the cause of a catastrophic data loss may never be found.) Received on Thu Feb 26 2009 - 17:46:15 CET

Original text of this message