Re: Null in subquery returns no records

From: Paul G. Brown <paul_geoffrey_brown_at_yahoo.com>
Date: 9 Feb 2004 17:02:56 -0800
Message-ID: <57da7b56.0402091702.5a3fe316_at_posting.google.com>


"Dan" <guntermannxxx_at_verizon.com> wrote in message news:<FGbVb.244$4e3.185_at_nwrddc02.gnilink.net>...

> What is the difference between a null, an "unknown value," and a null set?

 [ snip ]

 A little Boris and Gretchen Dialogue:

 [G]retchen: Hey, Boris! Got a moment?

 [B]oris: Sure. Justin Timberlake and Janet Jackson are . . . hey!   

 [G]: What?

 [B]: I dunno - just missed something, I think. [pause] Oh no one will

      care anyway. Wazuuup?

 [G]: I don't get nulls.

 [B]: That's alright. No one gets nulls.

 [G]: What do you mean?

 [B]: No one gets nulls. That's how nulls work. They're not really there.

 [G]: Oh! So they represent an absence? Like zero?

 [B]: Well, it depends a bit. What are we talking about here?

 [G]: Nulls.

 [B]: That's just a word, Gretchen. In what context?

 [G]: Logic, set theory, the relational model, programming languages. .   

 [B]: Wait on! Those are all very different things.

 [G]: OK. Let's start with nulls in logic.

 [B]: Well, there really isn't any such thing.

 [G]: No nulls in logic! What do you mean?

 [B]: Well, in the beginning George Boole described something that

      has become known as Boolean Logic, or Two-value logic. Each 
      proposition in boolean logic is a mapping function into a set
      that consists of { "true", "false" }. In other words, in a 
      logic language that uses two valued logic every proposition 
      is either "true", or "false". 

 [G]: Then why all the argument?

 [B]: Well, two is not a very big number, and mathematicians like a

      challenge. So they invented multi-valued logics.

 [G]: Sort of like multi-value databases, Boris?

 [B]: (blows beer out nose). No. (wipes shirt) Nothing like multi-value

      databases.

 [G]: How do multi-value logics work?

 [B]: They add more possible states to "true" and "false". In fact, at

      a certain point you really want to let go of "true", "false", or 
      any of the notions of truth and logic with which your high-school
      acquainted you. 

 [G]: Sort of like the Bush Administration, Boris?

 [B]: (takes thoughtful sip) Well, I wouldn't go that far. I mean, the

      Bush administration has posited the existance of known-knowns, 
      unknown-knowns, known-unknowns and unknown-unknowns but so far 
      research along those lines hasn't proven very productive. Nations 
      in posession of multi-value logics understand that they're governed 
      by rules, and you had better follow them, or all hell breaks lose. 

 [G]: What do you mean by rules?

 [B]: In Boolean Logic, we have predicates that can be used to build

      formulae. In the simplest boolean logic predicates are things 
      like AND, OR, NOT, XOR and so on. Each of these serves as a kind 
      of map: given one or two variables, the predicate maps the input
      into a single value. Familiar with truth tables, Gretchen?


[G]: Sure. They look like this:
AND->V1 T F OR->V1 T F V2 V2 T T F T T T F F F F T F
[B]: Very good. Now suppose there were *more* possible values for the
logic variables. For grins, let's call the values "left", "right" and "up". Well, now we can construct a lots of other truth-tables. They still take two variables, but they can be used to generate any behaviors you might like. For example: AND->V1 "left" "right" "up" XOR->V1 "left" "right" "up" "left" "left" "right" "up" "left" "left" "left" "up" "right" "right" "right" "up" "right" "left" "right" "up" "up" "up" "up" "up" "up" "up" "up" "up"

[G]: But what does it mean?

[B]: (shrugs) What would you like it to mean? That doesn't really

       matter, so far as the system is concerned. Just apply the rules
       consistently. Whenever you see an "unknown value", what it 
       "means" is nothing more or less than what the rules let it 
       "mean" relative to the other values. There have been quite a lot
       of proposals, few of which have any accessible interpretation. 

[G]: Sort of like what happened to the WMDs?

[B]: Moving right along Gretchen.

[G]: (thinks) Oh! I see! If we say that "left" means "true", and "right"

       means "false", and "up" means "unknown" (or "null"), then what you 
       have is the SQL truth tables for AND and OR!

[B]: Who's talking about SQL?
  

[G]: Well, we are characters in a dialogue on comp.databases.theory, aren't

       we?

[B]: But Bob says . . . .

[G]: Let's move along, Boris. What about three-valued logic in the

       relational model?

[B]: (grumbles) I don't wanna talk about it.

[G]: Oh come on, Boris. For an anthropomorphic manifestation of this poster's

       ego, you're a pompous ass!

[B]: OK, OK. Lots of people got kind of worried about "missing information"

       and the relational model. On the one hand, if you make the "closed 
       world assumption", then it really isn't a problem. Things not 
       written down are false by definition. Instead you might use default 
       values in the tuples and they make the propositions mean something 
       like:

       General Form: "There exists Planets, with unique Names, and 
       Mass (always greater than 3.3E-27 kg)."

       "There exists a planet, Named 'Mars' [ with Mass at least 3.3E-27 kg.]"

[G]: What's so special about "3.3E-27 kg."?

[B]: That's about what a proton weighs. Our schema can make the

       quite reasonable assumption that everything weighs at least as much
       as a proton. This guarantees that your queries won't return 
       wrong data. In my example, we know the name of the planet, but not 
       its mass. If we don't know Mar's mass, and we want to 
       know how many planets we know to have mass more than 5.98E+24 kg, 
       we don't want Mars being returned.

[G]: OK. That's straight forward enough. Why do we need NULLs?

[B]: Well, because not everyone thought the CWA was a good idea. I mean,

       it's probably fine for transactional business data processing 
       applications, but beyond that it gets kind of hard to justify. 
       Mind you, having figured out how unknown values worked within
       an open world framework, they promptly wrestled the idea back into
       the closed world. 

[G]: So what happened?

[B]: Codd himself first broached the subject of missing information and

       unknown or null values. Now, once you have missing values you've got to 
       figure out what to do about them. Eventually it was decided that 
       anytime you did anything with a variable whose value was marked 
       as "unknown", the result would also be "unknown".


[G]: And hence, 3VL in SQL, which is really just one kind of multi-valued
logic.

[B]: Yup. Of course, NULL was an unfortunate choice of term.

[G]: Because null means something in set theory, right?

[B]: Well done, Gretchen, yes, absolutely. A null set is an empty set. This

       does not mean that the set contains a null value: that's a different  
       set entirely. 

[G]: A null set is not the same thing as a set containing a null?

[B]: Think about the empty set, for a second. How many things does the

       empty set contain?

[G]: Well, none.

[B]: Right. The null set is the empty set. But if you write down the

       symbol for null, and place it within curly braces, how many things
       are inside that set?

[G]: Well, one, the null set.

[B]: Right! So in logic, null is a value that can be assigned to a

       logical variable. In set theory, the empty set is a kind of 
       constant. The ideas smell very similar, but think about how each 
       of them is combined with other things like themselves -- that is, 
       things like variable values and set values. Take a non-null set,
       and the null set, and find their union. You have the non-null 
       set. Take a variable set to null, and combine it with or compare it 
       to another variable that isn't null, and in the 3VL systems we 
       are talking about the answer is null.

[G]: Wow. Subtle. What about nulls in the relational model?

[B]: Some schools of thought reject the idea utterly. Others fudge it

       and say that nulls are a kind of 'out of band' marker assigned 
       to variables after query operations over relations that are 
       missing some information. You might interpret a null tuple 
       element to mean, for example ",and with property X UNKNOWN".  

[G]: I think I see.

[B]: Yeah. Of course, in SQL it gets more complex. Look at this:

       CREATE TABLE Test ( ID Integer NOT NULL PRIMARY KEY, Val INTEGER );

       INSERT INTO Test (Id, Val) VALUES (1, 1),(2, 2),(3,NULL),(4,NULL),(5,2);

       SELECT Val FROM Test;
VAL        
-----------
          1
          2
          2
          -
          -

  5 record(s) selected.

   [B]: SQL is a bag algebra, right? So the it returns whatever it finds in

        the column. But check this out. . .

       SELECT COUNT(Val) FROM Test;
1          
-----------
          3

  1 record(s) selected.

    [B]: COUNT(Col_Name) is special. COUNT(col_name) only considers non-null

         values. Now, look at this:

       WITH V AS ( SELECT Val FROM Test ) SELECT COUNT(*) FROM V;
1          
-----------
          5

  1 record(s) selected.

   [B]: COUNT(*) doesn't give a hoot about nulls. It counts tuples, not

        their contents. And I won't even go into what happens with MAX(),
        MIN() and GROUP BY HAVING. A lot of the problems stem from the 
        fact that while it is easy to confuse a NULL and a null set, they 
        really aren't the same thing. 

   [G]: Subtle, Boris.

   [B]: Yeah. About as subtle as a Super Bowl half-time show. Received on Tue Feb 10 2004 - 02:02:56 CET

Original text of this message