design dilemma... 1 or several tables?

From: lionel <lgaillard_at_vmimedical.com>
Date: 21 Nov 2001 11:41:54 -0800
Message-ID: <3d72742.0111211141.6c4f7e97_at_posting.google.com>



I can't choose between 3 different designs, can you help me out?
  1. business definition I have what we call ICD9 and ICD10 codes in the medical business, and I want to store those codes and associate them with patients' visits. 9 and 10 are revision number. ICD9 and ICD10 codes are different and there's no perfect mapping between them. However both codes structures are extremely similar, exception being that ICD9 are fully numeric codes, whereas ICD10 are alphanumeric. Historical data would be bound to ICD9 codes, whereas ongoing data would be bound to ICD10 codes (we may have some hospitals still using ICD9 codes as ongoing codes for awhile however). There are 4000 ICD9 possible codes, and 8000 ICD10 possible codes. It's not possible to add more codes later. If we ever wanted to add more codes, the guys who validate ICD codes would create an 11th revision (happens every 10 or 20 years).

[RDBMS: SQL Server 2000]

b)design
I can see 3 different zero forms (and there might be more). Sory if I didn't use DDL here, I thought it would be easier that way (moreover my tables are not formalized for now):

1:

   1 table with:

     visit_ID
     ICD_code (text)
     code_revision (number)

2:

   1 table with:

     visit_ID
     ICD9_code (number)
     ICD10_code (text)

3:

    2 tables:

      1 table with:
        visit_ID
        ICD9_code (number)

      1 table with:
        visit_ID
        ICD10_code (text)


1st reaction before we normalize those 3 Zero Forms:
- solution 1: ICD9 codes will be loosely stored as text instead of
pure numeric values. But management doesn't seem to be worried about that: codes definition is static, risk of error due to that week data typing is minimal.
- solution 2: we maintain a strong data type, but because you can't
have both ICD9 AND ICD10 attached to a visit, the table will contain a lot of NULL values. Anyway we have the same number of fields than in solution 1.
- solution 3: we maintain a strong data type, we don't have NULL
values, but we have 2 tables instead of one. Why not...

See, if those codes had been exactly indentical (same type of code, text for example), and if they were clearly identifying the same entity, normalization theory would have forced me to get rid of the 2nd and 3rd solution during the 1st Normal Form transformation process ("Eliminate repeating groups in individual tables"), so that I would have ended up with only one 3NF solution...

Ok, so now let's examine the 3NF resulting of those 3 different Zero Forms:

1:

    3 tables:

      visit_ICD          ICD_code                 ICD_code_revision
      ---------          --------                 -----------------
      visit_ICD_ID       ICD_code_ID              ICD_code_revision_ID
      visit_ID           ICD_code_revision_ID     revision (number)
      ICD_code_ID        code (text)

                         

2:

    3 tables:

      visit_ICD          ICD9_code            ICD10_code
      ---------          ---------            ----------
      visit_ICD_ID       ICD9_code_ID         ICD10_code_ID
      visit_ID           code (numeric)       code (text)
      ICD9_code_ID
      ICD10_code_ID
      

3:     

     4 tables:

      visit_ICD9      visit_ICD10      ICD9_code        ICD10_code
      ----------      -----------      ---------        ----------
      visit_ICD9_ID   visit_ICD10_ID   ICD9_code_ID     ICD10_code_ID
      visit_ID        visit_ID         code (numeric)   code (text)
      ICD9_code_ID    ICD10_code_ID


As you can see, I can get 3 different schemas, and people will cry to decide which one they will choose. This mess is just the result of an incapacity to choose what zero form to start with...
- solution 1 is less strict about data typing, but again, those codes
are static, once we have inserted them (one operation), we won't update them...
-solution 1 is more flexible if we want to add ICD11 codes in 15
years, but hell, 15 years... and in 15 years, we can build an extra ICD11 table too, it wouldn't kill use I guess...
-what is the best for querying performance, to have 2 tables (1 with
4000 records and 1 with 8000) or 1 table of 12000 records? We might have to query for both ICD9 and 10 codes attached with visits, however there would always be only one code revision (9 or 10) that would bring us results...
- is this really gonna be a pain for the development team to handle 2
tables instead of one? I mean, is that so complex?
- any other criteria?

Thanks for your help.  

Lionel Received on Wed Nov 21 2001 - 20:41:54 CET

Original text of this message