Re: Is mysql a RDBMS ?

From: Heikki Tuuri <Heikki.Tuuri_at_innodb.com>
Date: Mon, 25 Aug 2003 23:59:45 GMT
Message-ID: <Rpx2b.4189$G37.2895_at_read3.inet.fi>


Bob,

"Bob Badour" <bbadour_at_golden.net> kirjoitti viestissä news:Nws2b.656 ...
>
>
http://www.acm.org/classics/nov95/s1p3.html
>
> A set of tuples. You may find it interesting to note the original paper
> relied on column ordering; however, relational proponents have since
> discarded that idea.

"
The term relation is used here in its accepted mathematical sense. Given sets S1, S1, ···, Sn, (not necessarily distinct), R is a relation on these n sets if it is a set of n-tuples each of which has its first element from S1, its second element from S1, and so on. We shall refer to Sj as the jth domain of R. As defined above, R is said to have degree n. Relations of degree 1 are often called unary, degree 2 binary, degree 3 ternary, and degree n n-ary.
"

ok, it is a rather common mathematical definition of a relation as a subset of S1 x S2 x ... x Sn.

> I think most queries in real applications include at least one candidate
> key. That is, the cost of DISTINCT in most projections is zero.

This requires some work from the optimizer to notice. Also note that SQL in these cases acts like a mathematical-relation language, and you cannot expect any speedup from using a mathematical-relation database.

...
> > If you have evidence of this, you should publish it in an academic
> refereed
> > journal. This is a claim of some Codd-12-relationists, but I do not
recall
> > seeing refereed papers published about this. Have you got any
references?
>
> Citeseer is temporarily down right now so I cannot search for past papers.
I
> have serious doubts whether a referee would consider the topic
sufficiently
> novel and unapparent to warrant publication.

It is unapparent. Date has provided some contrived examples, but it would require a careful analysis of a real-world application to show that a mathematical-relation language really can calculate typical queries faster than SQL. People who claim that, should do the study and get it published in a refereed scientific journal. My guess is that SQL is not slower, and those Codd-12-relationists simply are wrong.

> Just off the top of my head, I can think of the following identities that
> improve performance for relations but not necessarily for multisets:
>
> COUNT(A UNION B) = COUNT(A) + COUNT(B) - COUNT(A JOIN B)
>
> A WHERE EXISTS ( B WHERE P(A,B) AND Q(B) )
> = ( A JOIN ( B' WHERE P(A,B') AND Q(B') )) PROJECT { ATTRIB(A) }
> (B' indicates the dbms may have to rename some attributes of B)
>
> ( A(w,u) JOIN B(w,v) ) PROJECT { w }
> = ( A(w,u) PROJECT { w } ) JOIN ( B(w,v) PROJECT { w } )
>
> ( A UNION A ) = A
>
> ( A JOIN A ) = A
>
> ( A INTERSECT A ) = A
The above formulas are not that often used in query optimization. Usually we have joins (e.g., the EXISTS above can be seen as a special 'join' where we stop calculation in B when we have found 1 row), and the problem is in determining the best order for joining several tables using a nested loop join. I do not see how a mathematical-relation language would make this join optimization easier. Looks like in real-world SQL queries duplicate rows usually do not play any role at all in the calculation process.

> > > SQL seems to spend a lot of time generating meaningless duplicates
that
> > are
> > > seldom used in any case.
> >
> > No, it does not spend time.
>
> What do you mean it does not spend time?

I mean that if we calculate a projection of a table to some columns, then it is faster just to output the multiset than use sorting to eliminate duplicates. Of course, if we feed this output to another join, then those duplicates can cause a lot of extra work. If we feed the output to an aggregate function like SUM(), then SQL actually works faster than a mathematical-relation language, because the latter would need to keep the primary key in the output to make it a set of distinct tuples.

> Any time it must continue
> enumerating data looking for potential duplicates is time wasted. Consider
> the algorithm that SQL must use to evaluate the equivalent of ( A JOIN A )
> due to the possibility of duplicates. Relationally ( A JOIN A ) is a no op
> because the result is always A.
>
> What does it really mean when the 3 rows of 1 above become 9 rows of 1 or
1
> row of 1, but cannot become 3 rows of 1? The relational cost is zero and
the
> SQL cost is guaranteed greater than zero.

The example is not a real-world query.

Best regards,

Heikki Tuuri
Innobase Oy
http://www.innodb.com
Foreign keys, transactions, and row level locking for MySQL InnoDB Hot Backup - a hot backup tool for MySQL Received on Tue Aug 26 2003 - 01:59:45 CEST

Original text of this message