Re: Examples of SQL anomalies?

From: Cimode <>
Date: Tue, 8 Jul 2008 17:50:43 -0700 (PDT)
Message-ID: <>

> >> From what I've seen, poor design usually translates to inadequate design.
> >> If you don't spend enough time on design, you end up reinventing the
> >> wheel
> >> halfway through the implementation, and it ends up costing upwards of
> >> double
> >> the budget.
> > How about the idea that poor(inadequate if you feel better with this
> > term) design = using NULLS
> NULLs are only one way to deal with missing information. From your comments
> it appears that you are equating a design that uses NULLs with a design that
> permits missing information.

I do not quite understand how you come to that conclusion or what may lead you to it. My point here is that using NULLS to handle missing information creates more problems than it solves.

> While a design that uses NULLs necessarily
> permits missing information, there are designs that do not use NULLs that
> also permit missing information. In fact, for every design that employs
> NULLs, there is an equivalent design that does not.

> >> >> >> Why would they have to spend hours? Hopefully any system would
> >> >> >> present
> >> >> >> enough complete information to obviate most guesses, but for the
> >> >> >> rest,
> >> >> >> given
> >> >> >> a choice between being given a potentially wrong answer and being
> >> >> >> told
> >> >> >> that
> >> >> >> the information is incomplete, I (and most people I know) would
> >> >> >> prefer
> >> >> >> the
> >> >> >> latter, even if it required a little extra effort on my part.
> >> >> > But then, following this logic and the 8 item weigth example, the
> >> >> > system would then require the information that the sum is relevant
> >> >> > only up to 8 items to be available to people for them to make
> >> >> > *educated* guesses. Don't you think that makes things more
> >> >> > complicated? Most people I know would tell you, you did a poor
> >> >> > design
> >> >> > job and that they don't need this irrelevant information.
> >> >> The information is already available. If SUM returned an
> >> >> indeterminant
> >> >> result, then the user should be able to issue additional queries
> >> >> qualified
> >> >> with something like, WHERE weight IS NULL or WHERE weight is NOT NULL,
> >> >> in
> >> >> order to find out why the result was indeterminant. It certainly
> >> >> isn't
> >> >> rocket science.
> > What information is available ? The information fact that it is not
> > correct(suspect) ? How is it already available ?
> If the information were not already available, then how could a query that
> performs a SUM be issued?

I suppose that the information you are refering to is the fact that marker *suspect*. But don't you think that a binary (suspect/non suspect) information would tend to be highly subjective on the designer's end ?

> > What interest in building systems that bring the information that the
> > system is not reliable and impose on people a two step that would
> > force them to re-interpretate and force them to permanently using NOT
> Who said that the system is not reliable? Returning an indeterminate result
> to a query that cannot be exactly answered exemplifies robustness, not a
> lack of reliability.

OK can you define the difference between system's *lack of robustness* and system's *lack of reliability* ?
I do not see quite the difference.

> It is certainly better than telling the user that no
> answer can be had, since there may be instances where the same query /can/
> be exactly answered.
> > Rocket science is understanding how is that simpler than simply taking
> > the NULLS out of the equation and not have to deal with all this.
> If the design permits missing information, then you will have to deal with
> all this. Again, NULLs are just one way to deal with missing information.
> If there should be a value, but it hasn't been supplied, then there needs to
> be an indicator that there should be a value. This applies regardless of
> whether NULLs are used or not. So to handle missing information without
> NULLs, you need an indication (possibly implicit) that there should be a
> value, which may need to be in a separate table, and another table for those
> values that should have been supplied and have. That way the absence of a
> tuple containing a value can be correctly interpreted, because it could be
> that there shouldn't be a value, and that's why there isn't a tuple.
OK so if I understand right. So basically you are talking about relation decomposition in case NULLS are not to be used to handle missing information ?

> >> > But what user are you refering the end user or the designer? Should
> >> > an application end user have to know about IS NULL / IS NOT NULL?
> >> Both, and neither. The user could be the designer, or an application
> >> developer, or the user could be an end-user writing a Crystal Report or
> >> integrating a spreadsheet. I would think that in order to integrate a
> >> spreadsheet, the end-user would have to know about IS NULL / IS NOT NULL
> >> if
> >> any of the columns in any of his queries can have NULLs.
> > So, basically the end user is intended as designer on a per use and
> > per application basis? Don't you believe this would leave the meaning
> > of information open to designer's interpretation?
> Isn't it already?

I tend to believe that interpretation of the meaning of data should be done by the end user who do not necessarily have SQL background. I would also tend to think that designer's role is merely to *implement* a construct/structure/formalization that would facilitate such interpretation for the end user . I do believe that using NULLS to handle missing information confuses both designer's and end user because they have to reconcile their respective subjectivenesses.

Not to mention that on a physical standpoint, it degrades performance because of additional index scans performed when each optional column of a table should incluse or not IS NULL/IS NOT NULL.

End user tend to end up frustrated by a lack of correctness due to designer's subjectiveness, a lack of accuracy due to a proprietary implementation of the markers, and an important performance degradation due to ISNULL index cumulative overhead. Received on Tue Jul 08 2008 - 19:50:43 CDT

Original text of this message