Re: assigning codes to events

From: Bob Badour <bbadour_at_golden.net>
Date: Wed, 19 Jun 2002 00:14:37 -0400
Message-ID: <DDTP8.91$pL1.13433981_at_radon.golden.net>


Both of the options you give have flaws that are not of their own creation. You really want to have full domain support ie. full user-defined type support.

You could then declare a type for your code with operations WasDelivered and WasReturned. Then the actual representation of the code would not have importance only the defined operations would.

The first model you gave attempts to model a time-invarying attribute in a relation variable, and variables are time-varying of course. One could inadvertently change the meaning of a code much easier having the WasDelivered and WasReturned properties as booleans in a relation than if one had to change a type definition.

The second design you gave requires the user to apply arbitrary, opaque (probably undocumented) translation rules to extract useful information from the data. This is not an issue of 1NF because all data has internal structure that operations reveal.

Above, I said the flaw is not of your creation because SQL does not fully support user-defined types. Both of the designs you gave are kludged workarounds for a flaw in SQL.

I think it is probably easier to train end-users to use ranges and values (provided the design does not have too many such codes) than it is to train users that 1 means TRUE and 0 means FALSE. And then some systems will assume that -1 means TRUE and 0 means FALSE, and yet other systems will assume that 0 means success and all other values are some error code.

Most people when presented with an image of the codes/descriptions table will recognize that all the 300's represent returned mail just from the description. However, the design presents the very real risk that someone will decide that all the codes from 300-325 mean a letter was returned so they can use the range 300-399 for another predicate that includes returned mail. In this case, any query that assumed 300-399 means returned mail would break.

Without full support for user-defined types, no ideal design exists and one will just have to pick the kludge they find least evil.

"Chris Cubley" <chriscubley_at_hotmail.com> wrote in message news:d8eb949e.0206181605.6a3361f2_at_posting.google.com...
> I often run into the issue of needing a particular set of codes to
> represent some set of actions within a system. For instance, consider
> the (somewhat) trival example of the postal system. You send a
> letter. You then need a set of codes to describe the outcome of that
> letter (assume you don't care about resending in case of failure,
> etc.). Assume one of the following things can happen to the letter:
>
> - Delivered successfully
> - Lost in mail (permanantly)
> - Returned to sender
>
> My approach to creating codes to symbolize the state of the letter
> would be to create the following:
>
> Letters
>
> LetterID DateSent ResultCode
> --------- ---------- -----------
> 1 5-15-2002 1
> 2 5-16-2002 1
> 3 5-17-2002 3
> 4 5-17-2002 2
> 5 5-18-2002 3
>
>
> ResultCodes
>
> ResultCode Meaning WasDelivered
> ----------- ---------------------------- -------------
> 1 Delivered successfully 1
> 2 Lost in mail (permanantly) 0
> 3 Returned to sender 0
>
> If later the requirements of the system changed and required the
> system to keep track of whether or not the letter was returned because
> of a bad address or lack of postage, I would tend to change the
> database to the following:
>
> Letters
>
> LetterID DateSent ResultCode
> --------- ---------- -----------
> 1 5-15-2002 1
> 2 5-16-2002 1
> 3 5-17-2002 4
> 4 5-17-2002 2
> 5 5-18-2002 5
>
>
> ResultCodes
>
> ResultCode Meaning WasDelivered WasReturned
> ----------- ---------------------------- ------------- ------------
> 1 Delivered successfully 1 0
> 2 Lost in mail (permanantly) 0 0
> 3 Returned to sender 0 1
> 4 RTS - Insufficient postage 0 1
> 5 RTS - Address unknown 0 1
>
>
> I realize that there are quite a few other ways to handle this
> situation, but that is the one I would tend to choose. I have a
> colleague that has quite a different opinion. His approach would be
> the following:
>
> Original:
>
> Letters
>
> LetterID DateSent ResultCode
> --------- ---------- -----------
> 1 5-15-2002 100
> 2 5-16-2002 100
> 3 5-17-2002 300
> 4 5-17-2002 200
> 5 5-18-2002 300
>
>
> ResultCodes
>
> ResultCode Meaning
> ----------- ----------------------------
> 100 Delivered successfully
> 200 Lost in mail (permanantly)
> 300 Returned to sender
>
>
> Revised requirements:
>
> Letters
>
> LetterID DateSent ResultCode
> --------- ---------- -----------
> 1 5-15-2002 100
> 2 5-16-2002 100
> 3 5-17-2002 310
> 4 5-17-2002 200
> 5 5-18-2002 320
>
>
> ResultCodes
>
> ResultCode Meaning
> ----------- ----------------------------
> 100 Delivered successfully
> 200 Lost in mail (permanantly)
> 300 Returned to sender
> 310 RTS - Insufficient postage
> 320 RTS - Address unknown
>
>
> The application logic would then know that all codes 300 - 399 meant
> that the letter was returned to sender and that the different codes
> within that range specify why the letter was returned. If you're
> looking for all returned mail, you search for the range, otherwise,
> you search for a particular code.
>
> To me, this approach seems to violate the first normal form by making
> the ResultCode field non-atomic. It then indicates both that the
> letter was returned and the reason why the letter was returned. Am I
> correct in this or am I misinterpreting the 1NF?
>
> Whether or not my reasoning as to why I think the second approach is
> flawed is correct, I would like opinions as to the pros and cons of
> either approach and if there are any worthy third options (I'm pretty
> much certain there are). Also, if I am correct in my conclusion that
> the second approach is problematic, why are there so many systems
> created that have codes whose values are grouped according to meaning?
> I know I have written interfaces to quite a few third party systems
> (in several different industries) where this is the case. To me, the
> second approach seems very much like the old BASIC line numbering
> scheme... 10, 20, 30... oops, I need a line between 10 and 20...15. I
> may be being a bit overcritical here, thus my request for opinions for
> or against either method.
>
> Thanks in advance,
>
> Chris Cubley
> www.queryplan.com
Received on Wed Jun 19 2002 - 06:14:37 CEST

Original text of this message