Re: Dreaming About Redesigning SQL

From: Paul G. Brown <paul_geoffrey_brown_at_yahoo.com>
Date: 12 Oct 2003 15:18:59 -0700
Message-ID: <57da7b56.0310121418.43aeaf21_at_posting.google.com>


news:<5%Ghb.70458$rt6.3541092_at_phobos.telenet-ops.be>...
> Paul G. Brown wrote:
> > Jan Hidders <jan.hidders_at_pandora.be.REMOVE.THIS> wrote in message news:<3f851baa.0_at_news.ruca.ua.ac.be>...
> >>
> >>Er, well, file names are actually not abstract because they have a
> >>concrete representation that you can read.
> >
> > We could go round and round on this. I'd observe that there is a whole
> > class of things which can potentially go into tuple attributes that aren't
> > 'values':

 [ snip - but they're all values ]

  Yes, but my point is that they're not the same kind of 'value' as, say, the   value '1' of domain INTEGERS, or 'Fred Flintstone' of VARCHAR(32). Recall   the original argument I'm making; it's hard to say "no" to a logical   REF/DEREF on the grounds that it violates data abstraction while at the   same time maintaining that filenames, SQL queries and other misc. scripts   are OK. There are other reasons to reject REF/DEREF related to redundancy   in the language, but that redundancy is not so clear in the other examples   I've given.

   The point is that as 'values of some domain', all of these are ambiguous.   Let's call what appears in the tuple attribute a 'token'. There are two ways   to 'interpret' a filename token: either as it literally appears, or else   through some kind of intermediary operation.

   TABLE Files ( Name File PRIMARY KEY );

   SELECT COUNT(*) FROM Files WHERE Name = "/tmp/Foo";

   Does this mean the count of times that the file-name "/tmp/Foo" appears    in the table Files, or is it the number of times a file is found    with *the same contents* as what is in the file "/tmp/Foo"? (Note that    this second interpretation implies that the result of this query can    be at most 1.)

    I'm not saying either interpretation is correct, and I know what SQL    says it ought to be. I'm just saying that this is open to more than one    interpretation.

> That you can do an extra step to derive some more information doesn't
> mean that they didn't already have some information to begin with.

    We seem to agree that the 'token' is ambiguous (holding aside the point    that SQL-92 allows you only one interpretation.)

> The
> crucial question is if in this extra step you need some extra
> information that is not to be found anywhere in the tables. For the
> evaluation of the query you don't. For dereferencing the reference
> without the help of a table that associates them with their destination,
> you do.

    Your point pivots on the question of what counts as being "inside"    the database. Nowhere, in the tables, does anything say when    two strings are "equal". That information is outside the database too,    in the sense that it resides in code implementing domain operators. But    it would be kind of silly to say that values of VARCHAR(32) violate the    information principle because needed information about these values is    "outside" the schema.

> For looking up the contents of the file without the help of a
> table that associates them with their contents, you do. For executing
> the script you don't.

>

> So as long as you don't expect a DEREF function or a function that
> magically looks up the content of a file none of you examples violates
> the information principle.

    In all of these examples--which derive from practical systems--you    do apply some DEFEF kind of logic to the token. In Postgres, for example,    (or any of the recent RDBMS releases) you can write functions to 'do    things' with the contents of the file referenced by the file-name. Several    DBMS products provide a mechanism for dynamically executing a SQL query    within another SQL query. By introducing a raft of domain specific    functions and operators in these systems you can get the behavior you want.

    Look: one way to overcome this problem is to do away with functions and    operators (like DEREF) altogether. I'm back on this hobby horse again.    Note that relations are disambiguated by their *entire* definition, not    just their names. I know this smells like function overloading and I need    a whole bunch of new relations but it does reduce the amount of "stuff"    in the relational model without reducing the model's power.

   RELATION Files    ( File FileName KEY );
   RELATION EQUALS   ( Domain String Key, Range String );
   RELATION EQUALS   ( Domain Stream Key, Range Stream );
   RELATION STREAM   ( Domain String Key, Range Stream );

   Q1: How many files called "/tmp/Foo"?

    VALUES CARD( Files( FileName ) EQUALS( FileName, "/tmp/Foo" ));

   Q2: How many files with same contents as "/tmp/Foo

    VALUES CARD( Files ( FileName) STREAM ( Filename, F.Stream )

                 STREAM ( "/tmp/Foo", Stream) EQUALS ( F.Stream, Stream ));

   End the tyranny of domain operators, now!!!

    KR

       P "signing off and heading for the tub" b Received on Mon Oct 13 2003 - 00:18:59 CEST

Original text of this message