Re: Practical considerations of dealing with two meanings of NULLs
Date: Wed, 08 Aug 2007 21:42:31 +0200
On Wed, 8 Aug 2007 06:58:35 -0400, sinister wrote:
>One simplistic method of trying to distinguish multiple meanings possibly
>associated with NULL is to associate particular non-NULL meanings with those
>For example, I might want to store both "unknown" and "known, but not yet
>filled in by data entry clerk." I could associate NULL to one and a
>particular non-NULL value to the other, or non-NULL values to both.
>If the data at issue are e.g. strings or something categorical, this is OK.
>(The only problem I can think of is the case where (in the interface---I'm
>using a webserver for that) I give users a selectbox of predetermined
>choices, and there's an associated textbox where they can put in "Other"
>(non-predetermined values)---what's to stop them from overlooking the
>predetermined choice "Unknown" and typing in "don't know" in the textbox?)
>But if the data are numerical, it's messy. It's easy if the data are known
>to be nonnegative, because then one can reserve particular nonnegative
>values for this (e.g. "-1 means not filled in by user," "-2 means known").
>But this method just seems ugly and kludgey to me.
>Is there any clean method of dealing with this?
As you can probablly gather from my post in your other thread, the clean method is to apply normalization rules.
Let's for discussion sake suppose that the specific attribute you are concerned with is "discount" (in an order processing DB). A discount can have a numeric value, either 0 (no discount) or >0 (some amount of discount). Let's also suppose that there might be different situations in which no value for the discount can be given - for example, the discount has not yet been decided on by management, or the discount is decided on but not yet stored.
If there is a need to store the reason for a missing discount, the data model would need two attributes. One for the discount, and one for the "reason discount not on file".
You can then either include both these columns in the Orders table, with a CHECK constraint to ensure that at any time, exactly one of them is NULL and the other is non-NULL. Or, if you prefer to omit all NULLs from your database, you can create three tables - Orders for generic information on all orders, OrdersWithDiscount for orders for which a discount is on file (probably three columns: OrderID, Discount, and NetAmount), and OrdersWithoutDiscount (probably two columns: OrderID and ReasonDiscountNotOnFile). Though the latter will eliminate NULLs from your stored tables, it won't get rid of them entirely, unless you never ever have to join data from these various tables. If you do, you'll get them back when outer joining the tables (and you'll lose rows if you use inner join instead).
Best, Hugo Received on Wed Aug 08 2007 - 21:42:31 CEST