Re: Is relational theory irrelevant?

From: Serge Rielau <srielau_at_ca.eye-bee-m.com>
Date: Mon, 17 Nov 2003 22:42:42 -0500
Message-ID: <bpc4ci$ogk$1_at_hanover.torolab.ibm.com>


Intersting thread you folks have goin here, hope you accept late arrivals. Some thoughts of mine:
The correct SQL way to write the quota is:

SELECT * FROM EMP ORDER BY SAL FETCH FIRST 5 ROWS ONLY There are a couple of interesting things about this one. Unless there is a bindout (cursor) immediately following the query the ORDER BY is local to the select. It will NOT propagate up through the bracing FROM clause. This maintains the relational property that sets have no order. In short the order is internal and temporary.

So:
SELECT * FROM (SELECT * FROM T ORDER BY SAL) AS X FETCH FIRST 5 ROWS ONLY is not deterministic. (well even the first example will not be deterministic if salaries are not distinct).

When looking at relational algebra on one side and application programers at the other there is one problem: Application programers outnumber academicians by 100:1. The application programer who truly understands realtional processing is a rare exception.
Any company that tries to map the theory to teh praxis will therefore have to make compromises between:

a) keeping the model clean
b) making the language usable by the users (application programers)
c) making the execution fast.

No need to talk about a) in this group I suppose. W.r.t. b) application programmers thing procedural. Nested queries are pushing it. Therefore a lot of what you see out there is a simpel SQL braced by a lot, a whole lot, of procedural logic (be it PL/SQL, T-SQL or SQL PL). Any attempts to ignore this fact will be punished. As evidence I point to SQL Server's success in the marketplace: Simple SQL with a really fast, application develoiper centric, procedural icing.

Ragarding c) the realtional model is built for semantic beauty. Semantic beauty does not make for a fast web-experience. Pipelining however does. So a lot of effort is being made to pipeline SQL. Often the rules of the relational model are bent to get there.
Example:
SELECT * FROM (SELECT sendmail() FROM T) AS X WHERE c1 > 100; How many emails shall be send? Correct (IMHO) would be: As many emails as there are rows in T. In reality many DBMS will push the predicate through to T for the sake of speed, and most customers evidently don't care.

I personally often wish I worked in academia where I could fix the system of it's legacy bugs and most importantly always place my version of correctnes above all.
Unfortunately what is fueling the industry are the customers and not academia. And all I can do is try to walk that fine line between earning my money and resisting to break the model. My only salvation is the firm knowledge that those who break the model bad, shall be punished with maintenance of it for ever.

Just my two cents.
Cheers
Serge

-- 
Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab
Received on Tue Nov 18 2003 - 04:42:42 CET

Original text of this message