# Re: Examples of SQL anomalies?

Date: Sat, 5 Jul 2008 05:11:25 -0700 (PDT)

Message-ID: <a2546fe6-6a1d-4560-a782-8cfef53e779f_at_25g2000hsx.googlegroups.com>

On 5 juil, 01:52, "Brian Selzer" <br..._at_selzer-software.com> wrote:

> "Cimode" <cim..._at_hotmail.com> wrote in message

*>
**> news:47943627-3b66-4fb2-950f-46359441f020_at_e53g2000hsa.googlegroups.com...
**>
**>
**>
**> > On 5 juil, 00:00, "Brian Selzer" <br..._at_selzer-software.com> wrote:
**> >> "David Cressey" <cresse..._at_verizon.net> wrote in message
**>
**> >>news:jSwbk.249$rb1.205_at_trndny08...
**>
**> >> > "Brian Selzer" <br..._at_selzer-software.com> wrote in message
**> >> >news:zOvbk.31378$ZE5.24693_at_nlpi061.nbdc.sbc.com...
**>
**> >> >> "David Cressey" <cresse..._at_verizon.net> wrote in message
**> >> >>news:D8ubk.240$rb1.211_at_trndny08...
**>
**> >> >> > "Brian Selzer" <br..._at_selzer-software.com> wrote in message
**> >> >> >news:gI6bk.11000$cW3.8591_at_nlpi064.nbdc.sbc.com...
**>
**> >> >> >> "David Cressey" <cresse..._at_verizon.net> wrote in message
**> >> >> >>news:wq4bk.28$0V1.10_at_trndny01...
**>
**> >> >> >> > "Brian Selzer" <br..._at_selzer-software.com> wrote in message
**> >> >> >> >news:nFD9k.5753$LG4.2422_at_nlpi065.nbdc.sbc.com...
**>
**> >> >> >> >> "-CELKO-" <jcelko..._at_earthlink.net> wrote in message
**>
**> >> >news:f219a6bd-9d8e-4cfe-9d60-ce9dcaeff16d_at_z66g2000hsc.googlegroups.com...
**> >> >> >> >> >>> The question is, if these issues are due to the SQL
**> >> > specification
**> >> >> > or
**> >> >> >> >> >>> simply due to a problem in a specific SQL product. Or could
**> >> >> >> >> >>> it
**> >> > be,
**> >> >> >> > that
**> >> >> >> >> >>> the definition is not precise enough in some points, so that
**> >> >> > database
**> >> >> >> >> >>> vendors implemented it differently? <<
**>
**> >> >> >> >> > Nope, it is the specs. All aggregate (set) functions begin by
**> >> >> >> >> > removing the NULLs from their parameter set, then if there is
**> >> >> >> >> > a
**> >> >> >> >> > DISTINCT option on the parameter, they remove redundant
**> >> >> >> >> > duplicates
**> >> >> > and
**> >> >> >> >> > finally do the operation (MIN, MAX, AVG, SUM, COUNT on what is
**> >> > left.
**> >> >> >> >> > Since an empty set has no elements upon which to apply an
**> >> > operation,
**> >> >> >> >> > SQL returns a NULL (okay, it should be an "undefined" if we
**> >> >> >> >> > were
**> >> >> >> >> > mathematically correct).
**>
**> >> >> >> >> MIN, MAX and AVG are meaningless when applied to an empty bag,
**> >> >> >> >> but
**> >> > it
**> >> >> >> > seems
**> >> >> >> >> to me that COUNT should always return 0 when the bag is empty,
**> >> >> >> >> and
**> >> >> >> >> similarly, SUM should return 0. SUM should only return NULL if
**> >> >> >> >> one
**> >> > of
**> >> >> >> >> the
**> >> >> >> >> values to be summed is NULL.
**>
**> >> >> >> > By definition, none of the values to be summed are NULL.
**>
**> >> >> >> OK. OK. Yes, I know. NULL is not a value. I guess to be precise,
**> >> >> >> I
**> >> >> > should
**> >> >> >> have said "if the cardinality of the set of rows targeted by the
**> >> >> >> query
**> >> > is
**> >> >> >> greater than the number of values to be summed, then SUM should
**> >> >> >> return
**> >> >> >> NULL," but I think that would have caused more confusion. Bottom
**> >> >> >> line:
**> >> >> >> if
**> >> >> >> not all of the amounts are known, then the total amount is suspect.
**>
**> >> >> > This has to do with a continuing disagreement between you and me.
**> >> >> > In
**> >> >> > my
**> >> >> > view there is no such thing as the "set of rows targeted by the
**> >> >> > query".
**> >> >> > There is a set of data targeted by the query. That is all. Rows
**> >> >> > that
**> >> > do
**> >> >> > not contain any data with regard to the current query are not part
**> >> >> > of
**> >> > the
**> >> >> > target, by definition.
**>
**> >> >> > Thus, sum (SALARY) from FUBAR is the sum of a bag of salaries.
**> >> >> > It's
**> >> > not
**> >> >> > the sum of a bag of rows from FUBAR. Rows in FUBAR that don't
**> >> >> > contain
**> >> >> > a
**> >> >> > SALARY are not part of the sum. That is all.
**>
**> >> >> What about sum (SALARY) from FUBAR where Fu = 'Bar'? Doesn't "where
**> >> >> Fu =
**> >> >> 'Bar'" restrict the query to the result of evaluating "where Fu =
**> >> >> 'Bar'"?
**> >> >> Clearly if COUNT(*) where Fu = 'Bar' is greater than COUNT(*) where Fu
**> >> >> =
**> >> >> 'Bar' and SALARY is not NULL, then it is obvious that some salaries
**> >> >> aren't
**> >> >> known, hence the total of those salaries, sum (SALARY) where Fu =
**> >> >> 'Bar'
**> >> >> is
**> >> >> suspect.
**>
**> >> >> I can't understand that you don't get this since it is just so simple.
**>
**> >> > It is simple, I do get it, and you are wrong.
**>
**> >> How can that be? If I have ten packages that need to be shipped, but I
**> >> only
**> >> know what eight of them weigh, then how can I print a bill of lading that
**> >> requires the total weight for the shipment? Obviously, the sum of just
**> >> the
**> >> eight weights will be less than what is actually shipped, so it should be
**> >> obvious that any attempt to pass that sum off as the total weight must be
**> >> held suspect.
**>
**> > How about "the system you plan on designing with NULLS does not handle
**> > that case OR it handles only te case for the 8 weigthed items"
**> > limitation. Best thing is extrapolation.
**>
**> Best thing is to treat the sum as suspect, as I have been suggesting from
**> the start. Extrapolation is problematic because there is no reason to
**> assume that the remaining items are comparable to those whose weight has
**> already been supplied. Suppose for example that the eight items for which
**> the weight has already been supplied are each under 2 pounds, but that the
**> remaining two items each weigh well over a ton. Extrapolation would yield
**> an estimate orders of magnitude in error.
*

I should have written..*next* best thing is extrapolation. Neither
extrapolation or flag a result as suspect are correct solutions to the
problem. In fact both of them are hacks the former being a more
sophisticated then the latter.

*suspect* has exactly the same downsides than *extrapolation* except that it may lead to further confusion due to subjective interpretation by users for the exact same reasons you mentionned. Why would an output be more suspect than another. Definitely not a good idea. Extrapolation through mathematical functions at least has the advantage of allowing the system to use probabilism Received on Sat Jul 05 2008 - 14:11:25 CEST