Re: projection with nulls

From: David Cressey <cressey73_at_verizon.net>
Date: Sun, 16 Mar 2008 21:11:50 GMT
Message-ID: <q2gDj.10991$k92.2152_at_trndny06>


"paul c" <toledobysea_at_ac.ooyah> wrote in message news:qScDj.90937$pM4.36949_at_pd7urf1no...
> (Just my stab at moving from the recent topics towards more basic
> questions, sorry if there's not much newspeak in it):
>
>
> Say you have R with (S)upplier, (P)art, (Q)uantity:
>
> R:
> S P
> - -
> a 2
> b _
>
>
> with predicate: Supplier S supplies Part P. The underscore in the
> second 'row' stands for 'NULL', ie., SQL's null.
>
>
> For argument's sake, say the first tuple means that 'if Supplier a
> carries any parts, it carries Part 2' and the second tuple might mean
> that 'if Supplier b carries any part, we do not know what that part's
> number is'. Not everybody might agree that this meaning is essentially
> the same as the predicate given but as long as the meaning or predicate
> is the same for all tuples, picking one or the other shouldn't matter to
> the dbms.
>

First, it shouldn't matter to the DBMS, but it should matter to the database designer, the data consumers, and the data suppliers. This whole business of assigning meaning to data that isn't there is fraught with pitfalls at the semantic layer. The best you are going to hope for is agreement among all the stakeholders. And that agreement has to be made explicit somewhere, usually in documentation. Without an explicit agreement you're back to everyone taking a guess at it, and that's puts you back where we were with files and records, even if you are using a dbms.

The meaning you have assigned to the first tuple is strange enough to me. If I had been asked to report on this data, in the absence of your explanation, I would have assumed the first tuple means that "supplier a carries part 2". The whole business of "if supplier a carries any parts" is a novelty to me.

As far as newspeak goes, I had thought that tuples, by their nature, did not cary SQL nulls in them. Maybe you and I mean different things by the word "tuple".

>
> Maybe not a realistic example, it's just for argument's sake (I don't
> think it is illegal as far as the SQL standard is concerned).
>
>
> It's always seemed to me that the projection R{S}, without any other
> operands, ought to return only {{S, a}}, ie., 'what suppliers supply
> followedsome part', just one tuple in this case which is the same as
> saying 'what suppliers supply some part if they supply any parts'. That
> is, if one allows nulls (which I'm not advocating).
>
>
> I gather that most or maybe all SQL products return {{S a}, {S b}}, ie.,
> two tuples. I don't see how Supplier b should be included because it's
> not known (in this example) whether Supplier b supplies any parts.
> Maybe somebody wants to answer what should happen if the projection is
> R{P} instead?
>
>
I'm not sure exactly what you mean by "a projection" in the context of and SQL product.

I have always treated GROUP BY and SELECT DISTINCT as providing the functionality that's closest to projection.

Some dialects of SQL will indeed eliminate the second tuple if GROUP BY or SELECT DISTINCT are used.

I'm not trying to create confusion here. I'm just trying to clarify your orignal post.

PS: See archives in Google groups for many discussions of SQL NULLS.

> Secondly, it seems that traditional logic demands the
> closed-world-assumption (CWA) in order to support negation. If the
> domain of part numbers were the set {1,2}, the complement of R above
> would include the tuples {S b, P 1} and {S b, P 2}. In other words,
> there aren't any parts that Supplier b supplies. Does this mean that
> SQL gives wrong answers if the CWA is assumed?
>

The dbms can always perform negation (if all the domains are finite). what the negation might mean is open to interpretation. Suppose we were to take a set of tuples that contain email addresses, along with other attributes. Now suppose we were to negate this set of tuples. Are we going to get a list of all possible email addresses, except the ones that were in the original set? Or are we going to somehow limit the universe of discourse to all email addresses that actually exist somewhere, but not in the set we negated?

For reasons illustrated by the above, there are a great many systems that do not support negation as such, but that do support the MINUS operator. The MINUS operator limits you to values that are in one manageable set but not in some other manageable set. This is a whole lot less open ended than negation.

>
> I guess one could argue that this is a kind of trick predicate but my
> attitude towards that is so what? We don't expect a logic engine to be
> aware of predicates.
>
>
> As far as conventional projection is concerned, it seems that R is
> overloaded in the sense that when it allows nulls it also allows tuples
> that aren't opaque/blunt enough to project against, ie., it's not very
> clear whether the two tuples always share the same predicate. Makes me
> wonder if a P attribute that is set-valued would be a less-intricate
> way to dispense with nulls. Sometimes I even wonder whether
> conventional projection doesn't have enough operands!
>
>
Received on Sun Mar 16 2008 - 22:11:50 CET

Original text of this message