Re: Is it possible to build a purely relational database on top of SQL?

From: John Jacob <jingleheimerschmitt_at_hotmail.com>
Date: 3 Aug 2004 23:20:26 -0700
Message-ID: <72f08f6c.0408032220.6b3eb04_at_posting.google.com>


Hello,

> Welll... the Aditi deductive database project took that approach
> (roll-your-own, but traditional, RDBMS back-end with a deductive DB
> front-end) and found it hard to be competitive: check out
> http://www.cs.mu.oz.au/research/aditi

Thank you for the link, although I tried to follow it and the site seems to be down. I will look for this implementation, sounds interesting.

> > Initially, the
> > logical model was 100% pure, and (almost) 100% TTM compliant
> > (if you would like, I can post a Manifesto Compliance document from the 1.0
> > version of the product).
>
> That would be very interesting.

I have posted the 1.0 compliance document as is (sans formatting) at the end of this post. The document was written quite some time ago, and I note that some of the statements are no longer applicable. The document lists the items of non-compliance, with all other items considered compliant. It is a relatively informal treatment, and it was written by us so it definitely cannot be construed as an endorsement by the authors. Indeed it can hardly be construed as an impartial assessment, although we tried to be as objective as possible.

> > However, the design goals of the system
> > called for translation into existing systems for performance.
>
> Can you explain where you had to sacrifice relational purity?
> Presumably you weren't expecting your system to be retrofittable to
> existing DBs full of NULLs and all that other SQL nastiness.

Besides the obvious difficulties with the SQL language due to inconsistency and lack of orthogonality, the primary failures of existing systems fall into three major categories (IMHO):

  1. User-Defined Types and Operators
  2. Support for Duplicates
  3. Support for Nulls

As far as the first item, the architecture has allowed a nearly complete retrofit of existing systems. We simply have a pure logical environment, and map those logical constructs onto physical implementations in our storage engine (existing systems). With most systems supporting some level of user-defined functions (although I have to say they are really just named expressions, and lack any real power) we are able to map most operator invocations. Even in cases where we can't, the engine can use parameterization to offload the results of invocation within our engine so processing can continue on the target system.

As far as duplicates are concerned, we can completely eliminate them. Firstly, we do not allow any base table variable to be defined without a key. If a key definition is omitted from the table definition, we assume all columns make up the key. Secondly, we do not allow any relational operation to introduce duplicates, translating all union and projection invocations using the distinct keyword. This leaves only the case where duplicates already exist in base table variable definitions in existing databases, or non-distinct union and projection invocations within derived table variables that are mapped as base table variables in our system. In these cases, the Dataphor Server will impose a key of all columns and use a distinct projection in the translation whenever the offending table variable is accessed. Of course, this still allows for problems that may occur within offending derived table variables prior to our engine, but there's nothing we can do about that.

And as for nulls, well, we've already discussed why we did it, let me just say that we had hoped we would be able to map nulls in existing systems to special values in our system, but the semantics are so different that this effort proved futile.

> > We knew
> > we could not match, at least initially, existing systems in terms of
> > sheer data processing power and query optimization (they've got thirty
> > years on us in this department).
>
> Have you any tech. reports or white papers or the like on your
> optimization technology and the problems you encountered interfacing
> with the back-end DBMSs?

I am currently working on a white paper that will detail the development and architecture of the SQL translation back end employed by the Dataphor Server.

> > We initially supported type inheritance as
> > well, but for various reasons we dropped it. One reason was the
> > complexity of mapping a polymorphic operator invocation into a
> > statically typed SQL-based system, as pointed out elsewhere in this
> > thread.
>
> I think you mean s/statically/non-polymorphically/?

I'm not sure what you mean here. Let me give an example. Suppose the following schema:

create type A...
create type B is A...

create operator BaseOp(A)...
create operator implementation BaseOp(B)...

create table T { C : A };

If I insert into T a row with a value for column C that is of most specific type B, when I invoke BaseOp(A) on the value in that column, the implementation of BaseOp(B) is actually invoked, and this decision is made run-time. Now further suppose that a mapping exists for BaseOp(A), the compile-time translation must use this mapping. If the invocation is performed internally, the run-time binding takes place. If the invocation is perfromed externally, the target system has no idea that the value in C is actually of type B, and so the support cannot be provided.

> This makes me think you *were* hoping customers could use your
> system on existing DBs containing NULLs. Or is the problem more
> fundamental than this - that is, you have to use NULLs to make
> even a non-backwards compatible system go fast?

Yes, even if an application is built without using any nulls, they can be introduced by expressions such as aggregates on empty sets and outer joins. In order to provide support for these operators, we had to include null semantics that were identical to the target system, otherwise the result of evaluating an expression would be different depending on whether the optimizer decided to offload the execution or not, obviously introducing non-determinism.

> As an aside, would the sky really fall if a database including NULLs
> was transformed into one using maybe-types (e.g. `NULL' ---> `no',
> non-NULL value `X' ---> `yes(X)')? I'm aware this may be a naive
> question, but since NULL has no declarative semantics that I can see,
> it looks to me as though any table containing NULLs can only be
> interpreted within the operational semantics of the query engine.
> Using maybe-types makes all that nonsense go away.

I'm not sure I fully understand the proposal. Could you elaborate? It sounds interesting.

> > While we spent a great deal of time making sure that the
> > operations we did perform were done efficiently, the fact remains that
> > Oracle and SQL Server will outperform us every time (at least today
> > <evil grin>). In the end, we were forced to bow out and implement null
> > semantics in exactly the same way that SQL-based systems do.
>
> What was the performance penalty you observed?

The most striking penalty had to do with sargability. Consider a restriction expression containing a reference to a nullable column:

select T where C = 5

If our equality comparison throws an exception whenever a nil is encountered, an example translation would be:

select * from T
  where
    case

      when C is null then Convert(int, 'Error') 
      else C = 5 

    end

Even a smart compiler would have a hard time sarging this one, and it only gets worse with more complicated expressions.

Regards,
Bryn Rhodes
Alphora

Following is the 1.0 Manifesto Compliance document. Some of the items of non-compliance are now compliant, but the only one that we are no longer compliant with is null support.



Documentation of Compliance with The Third Manifesto Alphora 9/24/2001
Items of non-compliance:

RM Prescription 3c:
No distinction is made between read only and update operators by the compiler. No restriction is made on update operators not returning a result. We see no reason to make an arbitrary distinction in this regard.         

RM Prescription 21, specifically multiple assignments: We are still in the analysis phase on this one.         

RM Prescription 26:
We do not feel we can make a statement of compliance on this prescription, because we are probably biased.

RM Proscription 6:
As long as the argument that some of our constructs are merely declarative meta data is accepted, we are in full compliance. There is certainly a need for a layer which ties the physical to the logical in some manner, we have opted to represent this layer as meta data attached to logical definitions. They mean nothing in the logical model, and are therefore separate from it. For example:         

create domain ID
{
  representation ID
  {
    Value : String

      read class "IDValueReadNode"
      write class "IDValueWriteNode"

  } class "IDSelectorNode"
} tags {StaticByteSize = "10"};                 

The StaticByteSize tag indicates to the storage layer that up to 10 bytes of storage be colocated with the row. The class constructs indicate which host language type is used to implement the selector and accessor operators for the domain.                 

OO Prescriptions 4 and 5:
Transaction support in the current version is limited to the supporting storage engine. In the next version we will implement full transaction support as prescribed by the Manifesto.                 

OO Prescription 6:
We comply, but what is the result of the invocation min(String) and max(string) on an empty table? We have reasoned that it is an exception, because of the impracticality of representing the max string.         

RM Very Strong Suggestion 1: Specifically a declarative mechanism for surrogate keys:
A surrogate key can be provided through the use of a scalar type default, so we saw no reason to include a syntactic construct, although such a mechanism would be trivial to implement.         

RM Very Strong Suggestion 2:
We support declarative referential integrity in the form of a Reference, which is defined as follows:         

create table A
{
  ID : ID,
  Name : string,
  key {ID}
};                 

create table B
{
  ID : ID,
  A_ID : ID,
  key {ID},
  reference B_A {A_ID} references A {ID} };                 

The execution of these statements will produce a constraint equivalent to the following database level constraint:                 

create constraint B_A
  IsSpecial(A_ID) or exists (A where ID = A_ID);                 

Where IsSpecial is an operator created by the system during the definition of the type on which A.ID is defined which returns true if the argument is equal to any special value defined on that domain.                 

We included this concept because a true foreign key is a subset of this behavior, and we have found this behavior to be useful in the past.                 

RM Very Strong Suggestion 4:
We do not support transition constraints at this time, however, we have plans to include such support in some future version.         

RM Very Strong Suggestion 6:
We have an operator we have called EXPLODE which provides a solution to the classic bill of materials problem, however we do not include explicit support for the generalized transitive closure operator, nor for generalized concatenate and aggregate operations, although such support could be added by an end user through RM Very Strong Suggestion 7.         

OO Very Strong Suggestion 1:
We have opted not to implement full type inheritance support in this version, due to the constraints of our storage architecture. The next version will include full type inheritance support as prescribed by the Manifesto. In fact, under the covers, the engine includes the support, it is merely turned off.         

OO Very Strong Suggestion 3:
We do not yet support the ARRAY and SET type generators, although an astute user could provide such support. Received on Wed Aug 04 2004 - 08:20:26 CEST

Original text of this message