Re: A better SQL implementation?

From: Cimode <cimode_at_hotmail.com>
Date: 8 Jun 2006 08:44:11 -0700
Message-ID: <1149781451.433419.127010_at_i39g2000cwa.googlegroups.com>


I am going to play devil's advocate...
David Portas wrote:
> Cimode wrote:
> > Concerning the following query using the Document data type:
> >
> > Select * from Document where parag = 'kw1' and parag = 'kw2'
> >
> > Looking at the paper, you can see that the author declares the
> > following according to the query:
> >
> > "It should be noted that the above select would give zero results for
> > any SQL database other than Atlas, since a single column cannot contain
> > more than one value. "
> >
>
> So this is a multi-valued DBMS not a relational one. The multi-valued
> features are not fully explained.
I see...to be quite honest I am not aware of multivalue model (?) implemented systems only of RM inspired DBMS systems. So I can not validate or invalidate your argument.

OTOH, I noticed that the example given traduce a smarter effective implementation of the '=' operator in order to define the Document complex data type.
Assuming for instance that the documentid foreign key in keyword relvar is of data type Document. What prevents using '=' as an operator to mean 'CONTAINS' concept. I find this rather elegant in SQL. Instead of writing the following query

If you believe SQL has a sufficient support of complex data types such as Document (set of words), what are the operators would define this particular data type then using traditional SQL? (thank you for adressing this particular definition.)

> There are three reasons given for creating a new syntax. The first and
> seemingly the main justification is the alleged poor performance of the
> original syntax. That's nonsensiccal. Performance is defined by the
> implementation not by a language. One might as well observe that a red
> Ferrari goes faster than a blue truck and then conclude that red makes
> vehicles go faster!

I do agree that the author uses wrong terminology and confuses SQL and RM. As I stated previously, he participated to early SQL implementations and is not versed in RM.

But I also believe the argumentation developped by the paper exclusively concerns physical implementation even if he is not using the right terminology. Index are implementation level and problems expressed are real. I have encountered them countless times with traditional SQL.

> The second reason given is that the old syntax may be hard to write or
> understand. Yet, the new syntax assigns a new and contradictory meaning
> to the most basic and well understood mathematical operator of all: the
> equals sign. Is that really going to be easier to understand? To me it
> looks like the consequences will be far worse than SQL's three-valued
> logic (Of course I'm guessing because the syntax is never explained.
> How does the new = operator affect the meaning of OR, NOT, NULLs for
> example?). You mentioned the CONTAINS syntax, which seems far more
> user-friendly than what is proposed in the paper.

I am curious as to why you believe that the logical meaning of '=' is redefined by the author. He just uses the operator to define a specific data type...

> Thirdly the author says he doesn't think he can extend and generalize
> the original SQL query he gave. In fact he could have used
> COUNT(DISTINCT) to do exactly that but the query given is not the
> easiest way to solve the problem in SQL. It could have been written
> much more easily as N self-joins.

It is true that he could have written the query more easily...Let's rephrase his examples in SQL...

Let's take a few examples...

Assuming that there the two following tables (written in traditional SQL almost code)

create table Document(documentid int PRIMARY KEY UNIQUE, document_subject char(30))
create table Keyword(keyword char(20) PRIMARY KEY UNIQUE, documentid int FK on Document NOT NULL)

Example 1) Without trying to count them for the moment, the query to find ALL documents that EITHER contains 'kw1' OR 'kw2' in a given document is...

select DISTINCT d.document_subject from
Document d inner join Keyword k

        on d.docid = k.docid
where k.keyword = 'kw1' or k.keyword = 'kw2'

His version of SQL's equivalent is...

select document_subject from Document where k.keyword = 'kw1' and k.keyword = 'kw2'

Again he uses the '=' symbol to implement CONTAINS operator for Document data type.

Example 2) now, the query for finding ALL documents containing both 'kw1' AND 'kw2' in the same document, the corresponding traditional statement would be...(if I am not mistaken what you refer as self joins)

select DISTINCT d.document_subject from
Document d inner join Keyword k1

		on d.docid = k1.docid
	inner join Keyword k2
		on d.docid = k2.docid and
		k1.docid = k2.docid

where k1.keyword = 'kw1' and k2.keyword = 'kw2'

His version of SQL's equivalent would be...

select document_subject from Document where keyword = 'kw1' or k.keyword = 'kw2'

Example 3) the query for finding ALL documents containing EITHER (BOTH 'kw1' AND 'kw2') in the same document OR (BOTH 'kw1' AND 'kw3') in the same document) would be...

(select DISTINCT d.document_subject from Document d inner join Keyword k1

		on d.docid = k1.docid
	inner join Keyword k2
		on d.docid = k2.docid and
		k1.docid = k2.docid

where k1.keyword = 'kw1' and k2.keyword = 'kw2') UNION
(select DISTINCT d.document_subject from Document d inner join Keyword k1
		on d.docid = k1.docid
	inner join Keyword k2
		on d.docid = k2.docid and
		k1.docid = k2.docid

where k1.keyword = 'kw1' and k2.keyword = 'kw3')

For the same result his version of SQL's would be...

Select * from Document where kw = 'kw1' and ( kw = 'kw2' or kw = 'kw2' )

I think his point is that traditional SQL is syntaxically redundant when combination of OR and AND conditions increase. We should keep in mind that traditional SQL is the result of people who have proven to not to know much about RM as this guy given the inepties produced. Let's give him a chance...

> David Portas
Received on Thu Jun 08 2006 - 17:44:11 CEST

Original text of this message