assigning codes to events

From: Chris Cubley <chriscubley_at_hotmail.com>
Date: 18 Jun 2002 17:05:37 -0700
Message-ID: <d8eb949e.0206181605.6a3361f2_at_posting.google.com>



I often run into the issue of needing a particular set of codes to represent some set of actions within a system. For instance, consider the (somewhat) trival example of the postal system. You send a letter. You then need a set of codes to describe the outcome of that letter (assume you don't care about resending in case of failure, etc.). Assume one of the following things can happen to the letter:
  • Delivered successfully
  • Lost in mail (permanantly)
  • Returned to sender

My approach to creating codes to symbolize the state of the letter would be to create the following:

Letters

LetterID DateSent ResultCode
--------- ---------- -----------

1         5-15-2002  1
2         5-16-2002  1
3         5-17-2002  3
4         5-17-2002  2
5         5-18-2002  3


ResultCodes

ResultCode  Meaning                      WasDelivered
----------- ---------------------------- -------------
1           Delivered successfully       1
2           Lost in mail (permanantly)   0
3           Returned to sender           0

If later the requirements of the system changed and required the system to keep track of whether or not the letter was returned because of a bad address or lack of postage, I would tend to change the database to the following:

Letters

LetterID DateSent ResultCode
--------- ---------- -----------

1         5-15-2002  1
2         5-16-2002  1
3         5-17-2002  4
4         5-17-2002  2
5         5-18-2002  5


ResultCodes

ResultCode  Meaning                      WasDelivered  WasReturned
----------- ---------------------------- ------------- ------------
1           Delivered successfully       1             0
2           Lost in mail (permanantly)   0             0
3           Returned to sender           0             1
4           RTS - Insufficient postage   0             1
5           RTS - Address unknown        0             1


I realize that there are quite a few other ways to handle this situation, but that is the one I would tend to choose. I have a colleague that has quite a different opinion. His approach would be the following:

Original:

Letters

LetterID DateSent ResultCode
--------- ---------- -----------

1         5-15-2002  100
2         5-16-2002  100
3         5-17-2002  300
4         5-17-2002  200
5         5-18-2002  300


ResultCodes

ResultCode Meaning

----------- ---------------------------- 
100         Delivered successfully       
200         Lost in mail (permanantly)   
300         Returned to sender           


Revised requirements:

Letters

LetterID DateSent ResultCode
--------- ---------- -----------

1         5-15-2002  100
2         5-16-2002  100
3         5-17-2002  310
4         5-17-2002  200
5         5-18-2002  320


ResultCodes

ResultCode Meaning

----------- ---------------------------- 
100         Delivered successfully       
200         Lost in mail (permanantly)   
300         Returned to sender           
310         RTS - Insufficient postage
320         RTS - Address unknown


The application logic would then know that all codes 300 - 399 meant that the letter was returned to sender and that the different codes within that range specify why the letter was returned. If you're looking for all returned mail, you search for the range, otherwise, you search for a particular code.

To me, this approach seems to violate the first normal form by making the ResultCode field non-atomic. It then indicates both that the letter was returned and the reason why the letter was returned. Am I correct in this or am I misinterpreting the 1NF?

Whether or not my reasoning as to why I think the second approach is flawed is correct, I would like opinions as to the pros and cons of either approach and if there are any worthy third options (I'm pretty much certain there are). Also, if I am correct in my conclusion that the second approach is problematic, why are there so many systems created that have codes whose values are grouped according to meaning?  I know I have written interfaces to quite a few third party systems (in several different industries) where this is the case. To me, the second approach seems very much like the old BASIC line numbering scheme... 10, 20, 30... oops, I need a line between 10 and 20...15. I may be being a bit overcritical here, thus my request for opinions for or against either method.

Thanks in advance,

Chris Cubley
www.queryplan.com Received on Wed Jun 19 2002 - 02:05:37 CEST

Original text of this message