Re: Storing codes vs. human-intelligible values - best practice?
Date: Sat, 28 Jun 2008 07:36:58 -0700 (PDT)
Message-ID: <3a6be074-b8ae-4708-8a69-7afe7b0c8e3e@f36g2000hsa.googlegroups.com>
On Jun 28, 9:36 am, dana..._at_yahoo.com wrote:
> In a pre-existing project, many tables reference domain (lookup)
> tables of the general structure:
> # COL_NM (column name) -- a primary key containing a human-
> intelligible value; max varchar length varies by table, but typically
> no longer than 30 characters
> * COL_CD (column code) -- a not null field with a unique
> constraint, typically an integer with a length of 1-2 digits.
> o COL_DES (column description) -- an optional description field;
> could be up to 255 chars
>
> The child tables referencing these domain tables use the COL_NM human
> intelligible values rather than the less storage intensive COL_CD
> (since it is unique and not null, it could also be used by a
> referencing child table).
>
> In a new project, it's been suggested to me that I continue to store
> COL_NM values rather than COL_CD values in the child tables with these
> rationales:
> -- It's better to use COL_NM because, for anyone directly querying the
> child tables, COL_NM means something to a human. COL_CD doesn't
> (unless someone has a really good memory).
> -- "Oracle can handle it"
>
> However:
> -- The new child tables will have orders of magnitude more rows than
> the child tables in the old database; so, although "Oracle can handle
> it", storage may be an issue, etc.
> -- I've always thought it was a best practice to store short codes
> rather than longer human-intelligible values.
> -- As far as I can tell, few if any users will be querying the child
> tables directly; and if they were, they could join to the domain table
> to get the human intelligible values
>
> Wondering if anyone with more experience than me (most people reading
> this group, probably) have any opinions on the best way to proceed.
> What other factors do I need to consider in making a choice?
>
> Thanks.
>
> Dana
>
> (A few project constraints that may or may not be relevant to my
> question (dictated to me, so they're non-negotiable) are:
> -- Can't use check constraints (and wouldn't want to; easier to update
> values in a domain table than change a constraint)
> -- Can't use Oracle product-specific domains; we're typically asked
> not to use vendor-specific features))
If the column in question will appear in one and only one table then storing human readable values is probably safe as well as practical as long as the values are relatively short and the total space requirements on the object really does not affect performance against the object. You can use an IOT to hold the valid list of values and use a FK to ensure data integrity. Everybody should be happy.
On the other hand if three or more table will reference this column you definitely want to use a lookup table. You also probably want to use a lookup table whenever the length of the lookup values is lengthy. I will admit lengthy is a relative term. I think it depends on how many columns that fall into this problem classification exist in the design, how many rows each table has, and the percentage of the table data the column(s) will make up verse how many joins you will need if you do not store the data.
Customers will not be too happy if to join Table-A to Table-B in their adhoc query he or she also has to join 5 additional tables to Table-B for lookup data. A view on Table-B that includes the lookup might be the solution here but you will also have to consider the cost of the 5 lookup joins.
Where only a few legal values exist column constraints can be used to enforce data integrity and views with case statements can be used to display the human meaningful value.
I am not sure of the correct name or exact usage of the feature yet but with 11g the calculated column feature might be of use here where only a few values exit.
In conclusion you best decision is going to depend on the data and how it is referenced and updated. In many cases carrying the actual data values in your tables rather than a reference value is probaly the most practical design otherwise to join 4 application base tables could easily end up requiring writing 12 table join due to the need to lookup reference data.
HTH -- Mark D Powell -- Received on Sat Jun 28 2008 - 09:36:58 CDT