Re: Dreaming About Redesigning SQL
Date: 12 Oct 2003 15:18:59 -0700
Message-ID: <57da7b56.0310121418.43aeaf21_at_posting.google.com>
news:<5%Ghb.70458$rt6.3541092_at_phobos.telenet-ops.be>...
> Paul G. Brown wrote:
> > Jan Hidders <jan.hidders_at_pandora.be.REMOVE.THIS> wrote in message news:<3f851baa.0_at_news.ruca.ua.ac.be>...
> >>
> >>Er, well, file names are actually not abstract because they have a
> >>concrete representation that you can read.
> >
> > We could go round and round on this. I'd observe that there is a whole
> > class of things which can potentially go into tuple attributes that aren't
> > 'values':
[ snip - but they're all values ]
Yes, but my point is that they're not the same kind of 'value' as, say, the value '1' of domain INTEGERS, or 'Fred Flintstone' of VARCHAR(32). Recall the original argument I'm making; it's hard to say "no" to a logical REF/DEREF on the grounds that it violates data abstraction while at the same time maintaining that filenames, SQL queries and other misc. scripts are OK. There are other reasons to reject REF/DEREF related to redundancy in the language, but that redundancy is not so clear in the other examples I've given.
The point is that as 'values of some domain', all of these are ambiguous. Let's call what appears in the tuple attribute a 'token'. There are two ways to 'interpret' a filename token: either as it literally appears, or else through some kind of intermediary operation.
TABLE Files ( Name File PRIMARY KEY );
SELECT COUNT(*) FROM Files WHERE Name = "/tmp/Foo";
Does this mean the count of times that the file-name "/tmp/Foo" appears in the table Files, or is it the number of times a file is found with *the same contents* as what is in the file "/tmp/Foo"? (Note that this second interpretation implies that the result of this query can be at most 1.)
I'm not saying either interpretation is correct, and I know what SQL says it ought to be. I'm just saying that this is open to more than one interpretation.
> That you can do an extra step to derive some more information doesn't
> mean that they didn't already have some information to begin with.
We seem to agree that the 'token' is ambiguous (holding aside the point that SQL-92 allows you only one interpretation.)
> The
> crucial question is if in this extra step you need some extra
> information that is not to be found anywhere in the tables. For the
> evaluation of the query you don't. For dereferencing the reference
> without the help of a table that associates them with their destination,
> you do.
Your point pivots on the question of what counts as being "inside" the database. Nowhere, in the tables, does anything say when two strings are "equal". That information is outside the database too, in the sense that it resides in code implementing domain operators. But it would be kind of silly to say that values of VARCHAR(32) violate the information principle because needed information about these values is "outside" the schema.
> For looking up the contents of the file without the help of a
> table that associates them with their contents, you do. For executing
> the script you don't.
>
> So as long as you don't expect a DEREF function or a function that
> magically looks up the content of a file none of you examples violates
> the information principle.
In all of these examples--which derive from practical systems--you do apply some DEFEF kind of logic to the token. In Postgres, for example, (or any of the recent RDBMS releases) you can write functions to 'do things' with the contents of the file referenced by the file-name. Several DBMS products provide a mechanism for dynamically executing a SQL query within another SQL query. By introducing a raft of domain specific functions and operators in these systems you can get the behavior you want.
Look: one way to overcome this problem is to do away with functions and operators (like DEREF) altogether. I'm back on this hobby horse again. Note that relations are disambiguated by their *entire* definition, not just their names. I know this smells like function overloading and I need a whole bunch of new relations but it does reduce the amount of "stuff" in the relational model without reducing the model's power.
RELATION Files ( File FileName KEY ); RELATION EQUALS ( Domain String Key, Range String ); RELATION EQUALS ( Domain Stream Key, Range Stream ); RELATION STREAM ( Domain String Key, Range Stream );
Q1: How many files called "/tmp/Foo"?
VALUES CARD( Files( FileName ) EQUALS( FileName, "/tmp/Foo" ));
Q2: How many files with same contents as "/tmp/Foo
VALUES CARD( Files ( FileName) STREAM ( Filename, F.Stream )
STREAM ( "/tmp/Foo", Stream) EQUALS ( F.Stream, Stream ));
End the tyranny of domain operators, now!!!
KR
P "signing off and heading for the tub" b Received on Mon Oct 13 2003 - 00:18:59 CEST