Re: Is multi-type data in one field normal?

From: Mike <Star.Point_at_mcsci.net>
Date: 10 Apr 2003 13:54:43 -0700
Message-ID: <a90c0da6.0304101254.7b97219a_at_posting.google.com>


"Bob Badour" <bbadour_at_golden.net> wrote in message news:<osbla.229

Bob, I thank you for your wisdom and time in answering my questions.

$DW2.20351002_at_mantis.golden.net>...
> "Mike" <Star.Point_at_mcsci.net> wrote in message
> news:a90c0da6.0304100109.581b21c4_at_posting.google.com...
> > I have been wrestling with a table design that stores different kinds
> > of statistic data in one field - and the type of statistic in another.
> > For Example:
> >
> > tblLabor
> > LaborID
> > UserID
> > CaseID
> > ActivityID
> > Date
> > Hours
> > *StatisticAmount*
> > *StatisticTypeID*
> > Notes
> >
> >
> > The "StatisticAmount" field is intrinsically linked to a
> > StatisticTypeID field since each number in the amount field is
> > qualified by the type field. If I want to find a sum for each type of
> > statistic I loop through the StatisticsTypes (in a tblStatisticsType
> > table), build a row set of tblLabor and sum the amount field for each
> > type. (There doesn't seem to be a single SQL statement to do the
> > summary.)
>
> Did you try:
>
> SELECT StatisticTypeID, SUM( StatisticAmount ) AS StatisticTotal
> FROM tblLabor
> GROUP BY StatisticTypeID
> ;

Yes, I did try a select just like that. Problem is - statistics that have no entries (i.e. no labor rows) do not get listed in the results.  Statistics with no entries are considered to be zero. I tried various kinds of sub-quires with WHERE EXISTS, IS NULL etc with no success. Looping seems like the only answer.

>
> > This "multi-type data" seems like an easy and functional way to store
> > a large number of different types of statistics. It works for the
> > most part as long as all statistics are of the same data type and/or
> > can fit into a compatible data type (i.e. put ints and floats into a
> > float). Some statistics might be small numbers (.000023) and others
> > large (23,456,222), some with decimal points others without
> >
> > The questions are: What is a more *normalized* way to do this?
>
> That depends. Are you writing an application that allows users to define the
> statistics they gather? Or are you writing an application that gathers a
> couple of unrelated statistics?

Yes, we allow users to add new statistics (and disable statistics) if they want. Adding a row to tblStatisticType adds a statistic name and setting a disabled bit on a row disables it. There are about 50 active statistics right now but this number could be 150 or more in the future. Also, all "enabled" statistics are listed on all case reports (statistics that have no labor entries are shown as zero – I know, statistics without labor entries really should be listed as unknown or N/A, others really make sense to be listed as zero, but I don't make the rules. Just recommendations...)

>
> > Is it
> > normalized to store different types of data in a single field?
>
> One would prefer to have polymorphism and subtypes for this, but the issue
> is independent of normalization, which relates to decomposition by the
> project operation and recomposition by the join operation.
>

The word "polymorphism" gave me some pointers to track on this issue. I found that "foreign key association" might be a good name for this odd method. Statistics are normalized in so much that their existence is dependant on a labor entry (key). I guess the real issue is how the business rules were stated when the system was put together – getting statistics is dependant on labor and labor is the only way to generate a statistic – or it's impossible to read a meter without labor...

>
> > Is
> > there a better way to do this?
>
> Nobody can really truthfully and credibly answer this question in a usenet
> newsgroup. Whether a particular way is better will depend on the full set of
> business rules, which nobody on usenet would have time to properly consider
> even if you tried to list them.
>

I agree, but it seems to me that "foreign key association" is an issue that database design books almost never tackle – or even mention (like the temporal issue) and that maybe there was a recommended construct kicking around the newsgroups that I could not find.

>
> > Should a field be reserved for each
> > different type of statistic?
>
> Perhaps, perhaps not. Another question you need to ask is: Should a table be
> reserved for each different type of statistic?

Yes, creating a table for each statistic looked like the "right" normalization. But building a table for each statistic seems overly complicated compared to hosting just the amount datum from all different types of statistics in one field. On the other hand doing this seems odd. I intrinsically prefer to find the number of people in a People field and number of Alpha Particles in a AlphaParticles field – not both listed in the same field!

Thanks for your time, Mike Received on Thu Apr 10 2003 - 22:54:43 CEST

Original text of this message