Re: Is mysql a RDBMS ?

From: Bob Badour <bbadour_at_golden.net>
Date: Mon, 25 Aug 2003 12:47:53 -0400
Message-ID: <Nws2b.656$cK7.79368235_at_mantis.golden.net>


"Heikki Tuuri" <Heikki.Tuuri_at_innodb.com> wrote in message news:52h2b.18$G37.11_at_read3.inet.fi...
> Hi!
>
> "Bob Badour" <bbadour_at_golden.net> kirjoitti viestissä
> news:glh2b.638$rG6.77308011_at_mantis.golden.net...
> ...
> > > But if we define a 'relation' as a multiset, then the above is a
> > 'relation'.
> >
> > If we call our ducks pigs, then our pigs will have wings and fly. But
does
> > that really mean pigs can fly?
> >
> > A relation is a set of tuples and not a multiset of rows.
>
> Does anyone remember how Codd defined a 'relation' in his 1970 paper?

http://www.acm.org/classics/nov95/s1p3.html

A set of tuples. You may find it interesting to note the original paper relied on column ordering; however, relational proponents have since discarded that idea.

> > > The DISTINCT operation can be quite costly in some cases.
> >
> > It can only have cost for projection and for union. In my experience,
>
> I think most queries in real applications involve a projection. That is,
you
> do not fetch all the columns of the table(s).

I think most queries in real applications include at least one candidate key. That is, the cost of DISTINCT in most projections is zero. The net cost of DISTINCT in most of the remaining projections is actually negative by reducing the amount of data to consider.

> > DISTINCT often improves performance by allowing the dbms to ignore
> potential
> > duplicates that very often cannot even exist.
>
> Can you show statistics about that?

If someone wants to pay for my time to make the measurements, sure.

> > > Chris Date claims that the concept of a multiset is confusing
> >
> > It is worse than confusing. Consider the following table:
> >
> > A
> > =
> > 1
> > 1
> > 1
> > 2
> >
> > When you observe the table variable above and decide you need to delete
> > exactly one row where A = 1, how do you express the deletion in SQL?
>
> In MySQL you write
>
> DELETE FROM ... WHERE A = 1 LIMIT 1;
I see the latest ANSI standard had kludged a solution to that specific question. My first thought is: Which of the three rows does it delete? Why? But I have a much more interesting question: Suppose one cannot predict exactly how many rows are 1, but one knows one must to delete all but 2 of them. How does one express the deletion in SQL?

> > >, and having
> > > only mathematical relations in a database query language would allow
for
> > > more query optimization.
> >
> > It will allow for more optimization at less cost.
>
> If you have evidence of this, you should publish it in an academic
refereed
> journal. This is a claim of some Codd-12-relationists, but I do not recall
> seeing refereed papers published about this. Have you got any references?

Citeseer is temporarily down right now so I cannot search for past papers. I have serious doubts whether a referee would consider the topic sufficiently novel and unapparent to warrant publication.

Just off the top of my head, I can think of the following identities that improve performance for relations but not necessarily for multisets:

COUNT(A UNION B) = COUNT(A) + COUNT(B) - COUNT(A JOIN B) A WHERE EXISTS ( B WHERE P(A,B) AND Q(B) ) = ( A JOIN ( B' WHERE P(A,B') AND Q(B') )) PROJECT { ATTRIB(A) } (B' indicates the dbms may have to rename some attributes of B)

( A(w,u) JOIN B(w,v) ) PROJECT { w }
= ( A(w,u) PROJECT { w } ) JOIN ( B(w,v) PROJECT { w } )

( A UNION A ) = A ( A JOIN A ) = A ( A INTERSECT A ) = A In addition, please see _Relational Database Writings 1991 - 1994_ C. J. Date
In particular, see installments 3, 7, 8, 9, 10, 11, 13, 17 and 19.

> > SQL seems to spend a lot of time generating meaningless duplicates that
> are
> > seldom used in any case.
>
> No, it does not spend time.

What do you mean it does not spend time? Any time it must continue enumerating data looking for potential duplicates is time wasted. Consider the algorithm that SQL must use to evaluate the equivalent of ( A JOIN A ) due to the possibility of duplicates. Relationally ( A JOIN A ) is a no op because the result is always A.

What does it really mean when the 3 rows of 1 above become 9 rows of 1 or 1 row of 1, but cannot become 3 rows of 1? The relational cost is zero and the SQL cost is guaranteed greater than zero. Received on Mon Aug 25 2003 - 18:47:53 CEST

Original text of this message