Re: Null in subquery returns no records
Date: 9 Feb 2004 17:02:56 -0800
Message-ID: <57da7b56.0402091702.5a3fe316_at_posting.google.com>
"Dan" <guntermannxxx_at_verizon.com> wrote in message news:<FGbVb.244$4e3.185_at_nwrddc02.gnilink.net>...
> What is the difference between a null, an "unknown value," and a null set?
[ snip ]
A little Boris and Gretchen Dialogue:
[G]retchen: Hey, Boris! Got a moment?
[B]oris: Sure. Justin Timberlake and Janet Jackson are . . . hey!
[G]: What?
[B]: I dunno - just missed something, I think. [pause] Oh no one will
care anyway. Wazuuup?
[G]: I don't get nulls.
[B]: That's alright. No one gets nulls.
[G]: What do you mean?
[B]: No one gets nulls. That's how nulls work. They're not really there.
[G]: Oh! So they represent an absence? Like zero?
[B]: Well, it depends a bit. What are we talking about here?
[G]: Nulls.
[B]: That's just a word, Gretchen. In what context?
[G]: Logic, set theory, the relational model, programming languages. .
[B]: Wait on! Those are all very different things.
[G]: OK. Let's start with nulls in logic.
[B]: Well, there really isn't any such thing.
[G]: No nulls in logic! What do you mean?
[B]: Well, in the beginning George Boole described something that
has become known as Boolean Logic, or Two-value logic. Each proposition in boolean logic is a mapping function into a set that consists of { "true", "false" }. In other words, in a logic language that uses two valued logic every proposition is either "true", or "false".
[G]: Then why all the argument?
[B]: Well, two is not a very big number, and mathematicians like a
challenge. So they invented multi-valued logics.
[G]: Sort of like multi-value databases, Boris?
[B]: (blows beer out nose). No. (wipes shirt) Nothing like multi-value
databases.
[G]: How do multi-value logics work?
[B]: They add more possible states to "true" and "false". In fact, at
a certain point you really want to let go of "true", "false", or any of the notions of truth and logic with which your high-school acquainted you.
[G]: Sort of like the Bush Administration, Boris?
[B]: (takes thoughtful sip) Well, I wouldn't go that far. I mean, the
Bush administration has posited the existance of known-knowns, unknown-knowns, known-unknowns and unknown-unknowns but so far research along those lines hasn't proven very productive. Nations in posession of multi-value logics understand that they're governed by rules, and you had better follow them, or all hell breaks lose.
[G]: What do you mean by rules?
[B]: In Boolean Logic, we have predicates that can be used to build
formulae. In the simplest boolean logic predicates are things like AND, OR, NOT, XOR and so on. Each of these serves as a kind of map: given one or two variables, the predicate maps the input into a single value. Familiar with truth tables, Gretchen?
[G]: Sure. They look like this:
AND->V1 T F OR->V1 T F V2 V2 T T F T T T F F F F T F
[B]: Very good. Now suppose there were *more* possible values for the
logic variables. For grins, let's call the values "left", "right" and "up". Well, now we can construct a lots of other truth-tables. They still take two variables, but they can be used to generate any behaviors you might like. For example: AND->V1 "left" "right" "up" XOR->V1 "left" "right" "up" "left" "left" "right" "up" "left" "left" "left" "up" "right" "right" "right" "up" "right" "left" "right" "up" "up" "up" "up" "up" "up" "up" "up" "up"
[G]: But what does it mean?
[B]: (shrugs) What would you like it to mean? That doesn't really
matter, so far as the system is concerned. Just apply the rules consistently. Whenever you see an "unknown value", what it "means" is nothing more or less than what the rules let it "mean" relative to the other values. There have been quite a lot of proposals, few of which have any accessible interpretation.
[G]: Sort of like what happened to the WMDs?
[B]: Moving right along Gretchen.
[G]: (thinks) Oh! I see! If we say that "left" means "true", and "right"
means "false", and "up" means "unknown" (or "null"), then what you have is the SQL truth tables for AND and OR!
[B]: Who's talking about SQL?
[G]: Well, we are characters in a dialogue on comp.databases.theory, aren't
we?
[B]: But Bob says . . . .
[G]: Let's move along, Boris. What about three-valued logic in the
relational model?
[B]: (grumbles) I don't wanna talk about it.
[G]: Oh come on, Boris. For an anthropomorphic manifestation of this poster's
ego, you're a pompous ass!
[B]: OK, OK. Lots of people got kind of worried about "missing information"
and the relational model. On the one hand, if you make the "closed world assumption", then it really isn't a problem. Things not written down are false by definition. Instead you might use default values in the tuples and they make the propositions mean something like: General Form: "There exists Planets, with unique Names, and Mass (always greater than 3.3E-27 kg)." "There exists a planet, Named 'Mars' [ with Mass at least 3.3E-27 kg.]"
[G]: What's so special about "3.3E-27 kg."?
[B]: That's about what a proton weighs. Our schema can make the
quite reasonable assumption that everything weighs at least as much as a proton. This guarantees that your queries won't return wrong data. In my example, we know the name of the planet, but not its mass. If we don't know Mar's mass, and we want to know how many planets we know to have mass more than 5.98E+24 kg, we don't want Mars being returned.
[G]: OK. That's straight forward enough. Why do we need NULLs?
[B]: Well, because not everyone thought the CWA was a good idea. I mean,
it's probably fine for transactional business data processing applications, but beyond that it gets kind of hard to justify. Mind you, having figured out how unknown values worked within an open world framework, they promptly wrestled the idea back into the closed world.
[G]: So what happened?
[B]: Codd himself first broached the subject of missing information and
unknown or null values. Now, once you have missing values you've got to figure out what to do about them. Eventually it was decided that anytime you did anything with a variable whose value was marked as "unknown", the result would also be "unknown".
[G]: And hence, 3VL in SQL, which is really just one kind of multi-valued
logic.
[B]: Yup. Of course, NULL was an unfortunate choice of term.
[G]: Because null means something in set theory, right?
[B]: Well done, Gretchen, yes, absolutely. A null set is an empty set. This
does not mean that the set contains a null value: that's a different set entirely.
[G]: A null set is not the same thing as a set containing a null?
[B]: Think about the empty set, for a second. How many things does the
empty set contain?
[G]: Well, none.
[B]: Right. The null set is the empty set. But if you write down the
symbol for null, and place it within curly braces, how many things are inside that set?
[G]: Well, one, the null set.
[B]: Right! So in logic, null is a value that can be assigned to a
logical variable. In set theory, the empty set is a kind of constant. The ideas smell very similar, but think about how each of them is combined with other things like themselves -- that is, things like variable values and set values. Take a non-null set, and the null set, and find their union. You have the non-null set. Take a variable set to null, and combine it with or compare it to another variable that isn't null, and in the 3VL systems we are talking about the answer is null.
[G]: Wow. Subtle. What about nulls in the relational model?
[B]: Some schools of thought reject the idea utterly. Others fudge it
and say that nulls are a kind of 'out of band' marker assigned to variables after query operations over relations that are missing some information. You might interpret a null tuple element to mean, for example ",and with property X UNKNOWN".
[G]: I think I see.
[B]: Yeah. Of course, in SQL it gets more complex. Look at this:
CREATE TABLE Test ( ID Integer NOT NULL PRIMARY KEY, Val INTEGER );
INSERT INTO Test (Id, Val) VALUES (1, 1),(2, 2),(3,NULL),(4,NULL),(5,2);
SELECT Val FROM Test; VAL ----------- 1 2 2 - -
5 record(s) selected.
[B]: SQL is a bag algebra, right? So the it returns whatever it finds in
the column. But check this out. . .
SELECT COUNT(Val) FROM Test; 1 ----------- 3
1 record(s) selected.
[B]: COUNT(Col_Name) is special. COUNT(col_name) only considers non-null
values. Now, look at this:
WITH V AS ( SELECT Val FROM Test ) SELECT COUNT(*) FROM V; 1 ----------- 5
1 record(s) selected.
[B]: COUNT(*) doesn't give a hoot about nulls. It counts tuples, not
their contents. And I won't even go into what happens with MAX(), MIN() and GROUP BY HAVING. A lot of the problems stem from the fact that while it is easy to confuse a NULL and a null set, they really aren't the same thing.
[G]: Subtle, Boris.
[B]: Yeah. About as subtle as a Super Bowl half-time show. Received on Tue Feb 10 2004 - 02:02:56 CET