Re: Examples of SQL anomalies?

From: Cimode <cimode_at_hotmail.com>
Date: Wed, 9 Jul 2008 15:04:12 -0700 (PDT)
Message-ID: <2bf7750c-75db-4b9a-a33e-9e1d06e17639_at_59g2000hsb.googlegroups.com>


[Snipped]
> Here's an example that requires the addition of a flag attribute:
>
> Original schema that uses NULL:
> R1 {Part#, Location} for parts that do not have a lot
> R2 {Part#, Location, Lot#} where Lot# can be NULL for parts that have a lot
> but it may not have been supplied.
>
> Alternate schema that doesn't use NULL:
> R3 {Part#, Location, HasLot}
> R4 {Part#, Location, Lot#}
> R4[Part#, Location] IN (R3 WHERE HasLot) [Part#, Location]

If I understand right your example on Lot# being optional, the correct solution without NULLS would be

> R1 {Part#, Location} for parts that do not have a lot
> R1_HasLot {PartHL#, LocationHL, Lot#} with R1_HasLot{PartHL#, LocationHL} references R1{Part#, Location} in R1
> R1_HasNoLot {PartHNL#, LocationHNL} with R1_HasNoLot{PartHNL#, LocationHNL} references R1{Part#, Location} in R1

the solution
> Splitting R2 into two tables would cause there to be two relations that have
> the same heading as R1. These are then combined into R3 with the addition
> of the flag HasLot.
The fact that they would have the same heading is simply a consequence of having the two relations associated columns having the same name not a consequence of the splitting itself. My example shows otherwise and the chosen column names shows otherwise.

[Snipped]

> > I suppose that the information you are refering to is the fact that
> > marker *suspect*. But don't you think that a binary (suspect/non
> > suspect) information would tend to be highly subjective on the
> > designer's end ?
>
> The information I am referring to is in the table that is the target of the
> SUM. Each row either has a value to be summed or doesn't have a value.
Target of the sum? I am sorry but you lost me.

> >> > What interest in building systems that bring the information that the
> >> > system is not reliable and impose on people a two step that would
> >> > force them to re-interpretate and force them to permanently using NOT
> >> > NULL/IS NOT NULL.
>
> >> Who said that the system is not reliable? Returning an indeterminate
> >> result
> >> to a query that cannot be exactly answered exemplifies robustness, not a
> >> lack of reliability.
> > OK can you define the difference between system's *lack of robustness*
> > and system's *lack of reliability* ?
> > I do not see quite the difference.
>
> I can certainly define the difference between *robustness* and *lack of
> reliability*. *robust* and *reliable* are basicly synonymous. The only
> difference is that *robust* carries with it a sense of strength and the
> ability to resist attempts to tear it down, whereas *reliable* instead
> carries with it a sense of repeatability or faithfulness.
Can you be more specific ? Can you define in few lines what you consider a *robust* system or what you define a *reliable* system.

> >> It is certainly better than telling the user that no
> >> answer can be had, since there may be instances where the same query
> >> /can/
> >> be exactly answered.
Returning a result is a responsability for the designer. He can do it with or without NULLS. Since I stopped using NULLS, I did not have a single user complaining about not having results or complaining about inaccurate results.

> >> > Rocket science is understanding how is that simpler than simply taking
> >> > the NULLS out of the equation and not have to deal with all this.
Decomposing into additional tables is done once at design time and does not seem complicated except if one is allergic to tables and have not fully understood the interest of views.

> >> If the design permits missing information, then you will have to deal
> >> with
> >> all this. Again, NULLs are just one way to deal with missing
> >> information.
> >> If there should be a value, but it hasn't been supplied, then there needs
> >> to
> >> be an indicator that there should be a value. This applies regardless of
> >> whether NULLs are used or not. So to handle missing information without
> >> NULLs, you need an indication (possibly implicit) that there should be a
> >> value, which may need to be in a separate table, and another table for
> >> those
> >> values that should have been supplied and have. That way the absence of
> >> a
> >> tuple containing a value can be correctly interpreted, because it could
> >> be
> >> that there shouldn't be a value, and that's why there isn't a tuple.
> > OK so if I understand right. So basically you are talking about
> > relation decomposition in case NULLS are not to be used to handle
> > missing information ?
>
> Isn't that the only alternative?
It is the only serious one I am aware of based on TTM. I also heard a lot of nonsense from OO solution to this.

[Snipped]
> > I tend to believe that interpretation of the meaning of data should be
> > done by the end user who do not necessarily have SQL background. I
> > would also tend to think that designer's role is merely to *implement*
> > a construct/structure/formalization that would facilitate such
> > interpretation for the end user . I do believe that using NULLS to
> > handle missing information confuses both designer's and end user
> > because they have to reconcile their respective subjectivenesses.
>
> I disagree again (but mainly with the last sentence): either there should be
> a value and it has been supplied, or there should be a value but it hasn't
> been supplied. There is nothing subjective about that. Subjectivity arises
> when NULL is used to mean more or other than "there should be a value but it
> hasn't been supplied." I consider that a misuse of NULL.
How do you guarantee that this misuse won't happen ever ? Why would you need to do ? Can you write down a *How to totally avoid to not misuse/drawback of using NULLS* guide. I buy you a copy right away and send it to my developpers. My guess would be that by the end of writing such book, you would be convinced that using them was not a good idea after all.

> As the result of
> an aggregate (or any expression, for that matter) NULL indicates "at least
> one value hasn't been supplied," which really boils down to "there should be
> a value but it hasn't been supplied." SQL's implementation sucks.
On that sentence I agree. That is why I am developping a db core that handles missing information by automatically making a transparent R table decomposition.

> I have no use for inapplicable nulls.
What if the entire distinction unknown/inapplicable was itself inherently bound to a poor idea? Why should there be only 2 why not 500 different type of NULLS. How about a marker called *lazy stressed out underpaid developper who did not have time code IS NULL/IS NOT NULL where condition*

> > Not to mention that on a physical standpoint, it degrades performance
> > because of additional index scans performed when each optional column
> > of a table should incluse or not IS NULL/IS NOT NULL.
>
> I would argue that the extra joins required for an equivalent design that
> does not use nulls as well as the enforcement of many more interrelational
> constraints would degrade performance more. In addition, you would still
> have to deal with the flag attributes, which being boolean are not conducive
> to indexing.
I do not quite see how a resource overhead of a *per table* JOIN can be compared to the *per column* overhead into having IS NULL/IS NOT NULL applied to optional columns. SInce a single JOIN implies one additional index scan, a table decomposed in 2 or 3 tables imposes only 2 additional index scans/seeks. Alternatively, a column that would have 5 optional columns would impose 5 additional index scans/ seeks to get the result right and avoid inaccurate results to be resturned.

> > End user tend to end up frustrated by a lack of correctness due to
> > designer's subjectiveness, a lack of accuracy due to a proprietary
> > implementation of the markers, and an important performance
> > degradation due to ISNULL index cumulative overhead.
>
> Again, it is the misuse of NULL that is the cause of the frustration.
I believe that no matter how well used because it is *unmaintainable*, it is inherently flawed approach. Look at the efforts required to maintain:
> Insure that all optional columns have at least one IS NULL/IS NOT NULL element in the WHERE condition. That requirement extends over the entire system life cycle since accuracy is handled at query level. An entire additional layer
> If you generate 60 new tables a year with 300 new optional columns then one needs to make sure that the 100 procedures using these columns have 1 IS NULL/ IS NOT NULL condition to guarantee accuracy. That is 300 * 100 = 30000 additional changes and additional index scans/seeks per run.
> Think about all the misunderstanding and confusions between designers/developpers and end users that can be a consequence of such maintainance price.

Think also about the alternative: Not using NULLS at all:
> 300 new optional columns decomposed into 2 additional tables each would cause 2 * 300 = 600 additional index scans/seeks with no additional changes to the code wrapped up in views. This work would have also to be done *once* at design time.

> Designers and developpers would not have to ask themselves everytime they write a query whether they would have to add IS NULL and whether they would use IS NOT NULL point.

So speaking about amount of work involved what does convince you that 30000 additional operations allowing NULLS may be simpler than 600 operations not using them at all. Received on Thu Jul 10 2008 - 00:04:12 CEST

Original text of this message