Re: Is mysql a RDBMS ?

From: Bob Badour <bbadour_at_golden.net>
Date: Mon, 25 Aug 2003 00:26:14 -0400
Message-ID: <glh2b.638$rG6.77308011_at_mantis.golden.net>


"Heikki Tuuri" <Heikki.Tuuri_at_innodb.com> wrote in message news:9272b.354$yu.1_at_read3.inet.fi...
> Hi!
>
> "Bob Badour" <bbadour_at_golden.net> kirjoitti viestissä
> news:Vj52b.593$q25.72766566_at_mantis.golden.net...
> > "Heikki Tuuri" <Heikki.Tuuri_at_innodb.com> wrote in message
> > news:0d12b.203$yu.178_at_read3.inet.fi...
> > > Morten,
> > >
> > > "Morten Gulbrandsen" <Morten.Gulbrandsen_at_rwth-aachen.de> kirjoitti
> > viestissä
> > > news:60ca69db.0308210016.822e230_at_posting.google.com...
> > > > Hello,
> ...
> > > > From Elmasri:
> > > >
> > > > To qualify as a genuine relational DBMS,
> > > > a system must have at least the following properties:
> > > >
> > > > 1.) It must store data as relations such that each column is
> > > > independently identified by its column name and the ordering of rows
> > > > is immaterial.
> > >
> > > This holds for MySQL.
> >
> > Are you saying that the following works with MySQL?
> >
> > select somecharfield name, someintfield id
> > from sometable
> > union
> > select anotherintfield id, anothercharfield name
> > from someothertable
> > ;
>
> Hmm... Elmasri and Navathe say that 'each column is independently
identified
> by its column name'. Now we can ask if they mean that the ordering of
> columns in a UNION expression should be irrelevant.
>
> In SQL (see http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt) we
> associate an ordering to the columns of a table, as well as the name of
the
> column. Ok, if Elmasri and Navathe mean that the ordering of columns in an
> expression should not matter, then SQL does not satisfy 1.).
>
> In SQL we think that an n-ary 'multiset' M consists of (1) an injection
from
> {1, 2, ..., n} to the set of possible column names and (2) a set of tuples
> of the form (k, a1, a2, ..., an). Here k is a natural number > 0 which
tells
> how many times the tuple (a1, a2, ..., an) occurs in the multiset, and
each
> ai is an element from a basic set determined by the data type of the ith
> column.
>
> In mathematics, we can define a relation as a set of mappings N -> A,
where
> N is a set of column names and A is some set or class. Then the ordering
of
> columns really does not matter in a union of two relations.
>
> > > > 2.) The operations available to the user, as well as those used
> > > > internally by the system, should be true relational operations; that
> > > > is, they should be able to generate new relations from old
relations.
> > >
> > > This holds for MySQL.
> >
> > Consider the following relation, AB:
> >
> > A B
> > = =
> > 1 1
> > 1 2
> > 1 3
> >
> > Are you saying that the following statement results in a relation in
> MySQL?
> >
> > select A
> > from AB
> > ;
>
> If we define a 'relation' as a set like in mathematics, and not as a
> 'multiset' like in SQL, then the above is not a 'relation'. You have to
add
> DISTINCT to get a mathematical relation.
>
> But if we define a 'relation' as a multiset, then the above is a
'relation'.

If we call our ducks pigs, then our pigs will have wings and fly. But does that really mean pigs can fly?

A relation is a set of tuples and not a multiset of rows.

> This is also relevant in 1.): SQL databases allow the storing of
multisets.

Which is one of the reasons SQL dbmses are not really relational dbmses even though SQL is the only commercially popular language based in any way on the relational model.

> But nothing prevents using SQL to store only relations in the mathematical
> sense.

The question is not whether SQL prevents one from pretending one has a relational dbms but whether SQL allows a dbms to be relational, and SQL does not.

Sometimes, more is less.

> The DISTINCT operation can be quite costly in some cases.

It can only have cost for projection and for union. In my experience, DISTINCT often improves performance by allowing the dbms to ignore potential duplicates that very often cannot even exist.

> Chris Date claims that the concept of a multiset is confusing

It is worse than confusing. Consider the following table:

A
=
1
1
1
2

When you observe the table variable above and decide you need to delete exactly one row where A = 1, how do you express the deletion in SQL?

>, and having
> only mathematical relations in a database query language would allow for
> more query optimization.

It will allow for more optimization at less cost.

> But I believe most real-world SQL queries produce
> only mathematical relations, and we can apply all the optimizations
present
> with mathematical relations.

Most is not all. An SQL dbms must work using only the identities that apply to multisets because SQL supports multisets. To use the identities that apply to relations and not to multisets, an SQL dbms must include logic to detect relations and then use those identities as well. At a first guess, the optimizer you envision will be twice as large and twice as complex as the relational optimizer.

SQL seems to spend a lot of time generating meaningless duplicates that are seldom used in any case.

> One can also argue that a multiset is a natural
> way to model the real world.

One can also argue that vinyl is a natural way to garnish potatos, but that doesn't make the argument intelligent or sane.

> For example, in quantum mechanics, electrons
> can be indistinguishable.

One has nothing to say about individual indistinct electrons. One might make a statement about a group of indistinct electrons; for example, one might quantify charge and mass of the group of electrons. If one can make a statement about an individual electron, that electron is obviously distinguishable.

> That is similar to having in a table two rows with
> the exact same column values.

Nope. It is actually similar to having a single identifiable row with one or more quantity attributes representing some identifiable thing like a group of indistinct electrons. Received on Mon Aug 25 2003 - 06:26:14 CEST

Original text of this message