From: --CELKO-- <71062.1056_at_compuserve.com>
Date: 13 Feb 2002 14:42:35 -0800

A = (10, 20, 30, 40);

B = (10, NULL, 30, 40); C = (10, NULL, 30, 100); It seems reasonable to define a tuple comparison as valid only when the datatypes of each corresponding column in the rows are union-compatible. If not, the operation is an error and should report a warning. It also seems reasonable to define the results of the comparison to the AND-ed results of each corresponding column using the same operator. That is, (A = B) becomes:

((10, 20, 30, 40) = (10, NULL, 30, 40));
becomes:
((10 = 10) AND (20 = NULL) AND (30 = 30) AND (40 = 40))
becomes:
(TRUE AND UNKNOWN AND TRUE AND TRUE);
becomes:
(UNKNOWN);
This seems to be reasonable and conforms to the idea that a NULL is a missing value that we expect to resolve at a future date, so we cannot draw a conclusion about this comparison just yet. Now consider the comparison (A = C), which becomes:

((10, 20, 30, 40) = (10, NULL, 30, 100));
becomes:
((10 = 10) AND (20 = NULL) AND (30 = 30) AND (40 = 100));
becomes:
(TRUE AND UNKNOWN AND TRUE AND FALSE);
becomes:
(FALSE);
There is no way to pick a value for column 2 of row C such that the UNKNOWN result will change to TRUE because the fourth column is always FALSE. This leaves you with a situation that is not very intuitive. The first case can resolve to TRUE or FALSE, but the second case can only go to FALSE.

The SQL-92 standard decided that the theta operators would work as shown in the table below. The expression RX <comp op> RY is shorthand for a row RX compared to a row RY; likewise, RXi means the i-th column in the row RX. The results are still TRUE, FALSE, or UNKNOWN, if there is no error in type matching. The rules favor solid tests for TRUE or FALSE, using UNKNOWN as a last resort.

The idea of these rules is that as you read the rows from left to right, the values in one row are always greater than (or less than) those in the other row after some column. This is how it would work if you were alphabetizing words.

The rules are

1. RX = RY is TRUE if and only if RXi = RYi for all i.
2. RX <> RY is TRUE if and only if RXi <> RYi for some i.
3. RX < RY is TRUE if and only if RXi = RYi for all i < n and RXn < RYn for some n.
4. RX > RY is TRUE if and only if RXi = RYi for all i < n and RXn > RYn for some n.
5. RX <= RY is TRUE if and only if Rx = Ry or Rx < Ry.
6. RX >= RY is TRUE if and only if Rx = Ry or Rx > Ry.
7. RX = RY is FALSE if and only if RX <> RY is TRUE.
8. RX <> RY is FALSE if and only if RX = RY is TRUE.
9. RX < RY is FALSE if and only if RX >= RY is TRUE.
10. RX > RY is FALSE if and only if RX <= RY is TRUE.
11. RX <= RY is FALSE if and only if RX > RY is TRUE.
12. RX >= RY is FALSE if and only if RX < RY is TRUE.
13. RX <comp op> RY is UNKNOWN if and only if RX <comp op> RY is neither TRUE nor FALSE.

The negations are defined so that the NOT operator will still have its usual properties. Notice that a NULL in a row will give an UNKNOWN result in a comparison. These row comparisons can be done in SQL-89, but translating the predicates is messy.

Consider this SQL-92 expression:

(a, b, c) < (x, y, z)

which becomes

((a < x)
OR ((a = x) AND (b < y))
OR ((a = x) AND (b = y) AND (c < z)))

The SQL-89 standard allowed only scalar subqueries on the right-hand side and a single-value expression on the left-hand side of the <comp op> ALL <subquery> and <comp op> SOME <subquery> predicates. SQL-92 is more orthogonal; it allows a single-row expression of any sort, including a single-row subquery, on the left-hand side. Likewise, the BETWEEN predicate can use row expressions in any position in SQL-92. Received on Wed Feb 13 2002 - 23:42:35 CET

Original text of this message