Re: Examples of SQL anomalies?

From: Brian Selzer <>
Date: Fri, 4 Jul 2008 16:33:35 -0400
Message-ID: <zOvbk.31378$>

"David Cressey" <> wrote in message news:D8ubk.240$rb1.211_at_trndny08...
> "Brian Selzer" <> wrote in message
> news:gI6bk.11000$
>> "David Cressey" <> wrote in message
>> news:wq4bk.28$0V1.10_at_trndny01...
>> >
>> > "Brian Selzer" <> wrote in message
>> > news:nFD9k.5753$
>> >>
>> >> "-CELKO-" <> wrote in message
>> >>
>> >> >>> The question is, if these issues are due to the SQL specification
> or
>> >> >>> simply due to a problem in a specific SQL product. Or could it be,
>> > that
>> >> >>> the definition is not precise enough in some points, so that
> database
>> >> >>> vendors implemented it differently? <<
>> >> >
>> >> > Nope, it is the specs. All aggregate (set) functions begin by
>> >> > removing the NULLs from their parameter set, then if there is a
>> >> > DISTINCT option on the parameter, they remove redundant duplicates
> and
>> >> > finally do the operation (MIN, MAX, AVG, SUM, COUNT on what is left.
>> >> > Since an empty set has no elements upon which to apply an operation,
>> >> > SQL returns a NULL (okay, it should be an "undefined" if we were
>> >> > mathematically correct).
>> >> >
>> >>
>> >> MIN, MAX and AVG are meaningless when applied to an empty bag, but it
>> > seems
>> >> to me that COUNT should always return 0 when the bag is empty, and
>> >> similarly, SUM should return 0. SUM should only return NULL if one of
>> >> the
>> >> values to be summed is NULL.
>> >>
>> >
>> > By definition, none of the values to be summed are NULL.
>> >
>> OK. OK. Yes, I know. NULL is not a value. I guess to be precise, I
> should
>> have said "if the cardinality of the set of rows targeted by the query is
>> greater than the number of values to be summed, then SUM should return
>> NULL," but I think that would have caused more confusion. Bottom line:
>> if
>> not all of the amounts are known, then the total amount is suspect.
> This has to do with a continuing disagreement between you and me. In my
> view there is no such thing as the "set of rows targeted by the query".
> There is a set of data targeted by the query. That is all. Rows that do
> not contain any data with regard to the current query are not part of the
> target, by definition.
> Thus, sum (SALARY) from FUBAR is the sum of a bag of salaries. It's not
> the sum of a bag of rows from FUBAR. Rows in FUBAR that don't contain a
> SALARY are not part of the sum. That is all.

What about sum (SALARY) from FUBAR where Fu = 'Bar'? Doesn't "where Fu = 'Bar'" restrict the query to the result of evaluating "where Fu = 'Bar'"? Clearly if COUNT(*) where Fu = 'Bar' is greater than COUNT(*) where Fu = 'Bar' and SALARY is not NULL, then it is obvious that some salaries aren't known, hence the total of those salaries, sum (SALARY) where Fu = 'Bar' is suspect.

I can't understand that you don't get this since it is just so simple.

> This same disagreement surfaces when you trot out your well worn argument
> that a row could have all its values updated and still be "the same row"
> in
> some meaningful sense. The row is just a container. That's all it is.
> It
> has no meaning in the universe of discourse. The data in a row preumably
> does have meaning in the Uof D.

No, it is not the same disagreement. That the same individual can have different appearences at different points in time has nothing to do with whether some salaries are known or not. I contend that what identifies an individual at one point in time need not be identical to what identifies the same individual at a different point in time. For example, the person occupying a particular seat--say, seat 409 in section K--on Monday night need not be the same person occupying that seat on Tuesday night. If in fact that person occupied seat 410 in section K on Tuesday night, then the identifier, seat 409 in section K, that was sufficient to identify that particular individual in the snapshot of reality that occured on Monday night, now no longer identifies that individual, but now a totally different identifier, seat 410 in section K, identifies that individual in the snapshot that occurred Tuesday night. To assume that it is not the same individual just because what identifies him at different points in time is different is just plain wrong. So yes, it is indeed possible for two completely different tuples, taken from different database values can refer to the same individual.

>> > I agree about COUNT and SUM on an empty bag. BTW, the DISTINCT
>> > keyword
>> > can
>> > be used with aggregates to obtain a projection: select count (distinct
>> > zip_code) from employees .
>> >
>> >
Received on Fri Jul 04 2008 - 22:33:35 CEST

Original text of this message