Re: Is multi-type data in one field normal?

From: Bob Badour <bbadour_at_golden.net>
Date: Thu, 10 Apr 2003 21:52:21 -0400
Message-ID: <0Vpla.253$Ck4.22670428_at_mantis.golden.net>


"Mike" <Star.Point_at_mcsci.net> wrote in message news:a90c0da6.0304101254.7b97219a_at_posting.google.com...
> "Bob Badour" <bbadour_at_golden.net> wrote in message news:<osbla.229
>
> Bob, I thank you for your wisdom and time in answering my questions.
>
> $DW2.20351002_at_mantis.golden.net>...
> > "Mike" <Star.Point_at_mcsci.net> wrote in message
> > news:a90c0da6.0304100109.581b21c4_at_posting.google.com...
> > > I have been wrestling with a table design that stores different kinds
> > > of statistic data in one field - and the type of statistic in another.
> > > For Example:
> > >
> > > tblLabor
> > > LaborID
> > > UserID
> > > CaseID
> > > ActivityID
> > > Date
> > > Hours
> > > *StatisticAmount*
> > > *StatisticTypeID*
> > > Notes
> > >
> > >
> > > The "StatisticAmount" field is intrinsically linked to a
> > > StatisticTypeID field since each number in the amount field is
> > > qualified by the type field. If I want to find a sum for each type of
> > > statistic I loop through the StatisticsTypes (in a tblStatisticsType
> > > table), build a row set of tblLabor and sum the amount field for each
> > > type. (There doesn't seem to be a single SQL statement to do the
> > > summary.)
> >
> > Did you try:
> >
> > SELECT StatisticTypeID, SUM( StatisticAmount ) AS StatisticTotal
> > FROM tblLabor
> > GROUP BY StatisticTypeID
> > ;
>
> Yes, I did try a select just like that. Problem is - statistics that
> have no entries (i.e. no labor rows) do not get listed in the results.
> Statistics with no entries are considered to be zero. I tried
> various kinds of sub-quires with WHERE EXISTS, IS NULL etc with no
> success. Looping seems like the only answer.

I see the issue. You want all statistics listed regardless of existence in tblLabor. You will need to query tblStatisticsType to get the full list of StatisticTypes.

SELECT StatisticTypeID
, ( SELECT COUNT(*)
    FROM tblLabor lab
    WHERE lab.StatisticTypeID = stat.StatisticTypeID ) As StatisticTotal
FROM tblStatisticType stat
WHERE Enabled
;

If your dbms does not allow the scalar subquery in the select list, you can write an outer join, but then you have the whole NULL mess to deal with.

> > > This "multi-type data" seems like an easy and functional way to store
> > > a large number of different types of statistics. It works for the
> > > most part as long as all statistics are of the same data type and/or
> > > can fit into a compatible data type (i.e. put ints and floats into a
> > > float). Some statistics might be small numbers (.000023) and others
> > > large (23,456,222), some with decimal points others without
> > >
> > > The questions are: What is a more *normalized* way to do this?
> >
> > That depends. Are you writing an application that allows users to define
the
> > statistics they gather? Or are you writing an application that gathers a
> > couple of unrelated statistics?
>
> Yes, we allow users to add new statistics (and disable statistics) if
> they want.

Then it seems to me that you need to model a generic statistics type encompassing all the statistics, and the solution you proposed originally could fit the bill. Again with the caveat that the full design will require full knowledge of the business rules.

> Adding a row to tblStatisticType adds a statistic name and
> setting a disabled bit on a row disables it.
>
> There are about 50
> active statistics right now but this number could be 150 or more in
> the future. Also, all "enabled" statistics are listed on all case
> reports (statistics that have no labor entries are shown as zero - I
> know, statistics without labor entries really should be listed as
> unknown or N/A, others really make sense to be listed as zero, but I
> don't make the rules. Just recommendations...)

Have you considered a design with a tblEnabledStatisticType with a single StatisticsTypeID column identifying the enabled statistic types? It would of course have a foreign key reference to tblStatisticsType. Inserting a statistics type would enable it and deleting a statistics type would disable it. Not that I think it is necessarily better or worse than the design you have, but it is an option to consider.

> > > Is it
> > > normalized to store different types of data in a single field?
> >
> > One would prefer to have polymorphism and subtypes for this, but the
issue
> > is independent of normalization, which relates to decomposition by the
> > project operation and recomposition by the join operation.
> >
>
> The word "polymorphism" gave me some pointers to track on this issue.
> I found that "foreign key association" might be a good name for this
> odd method.

While it may be necessary to learn the new jargon for object oriented programming (ie. 'object pattern names') if you need to communicate with teams of programmers who have jumped on this latest bandwagon, I suggest you first learn to think using sound theory backed concepts and terminology. Foreign keys are a relational concept involved with referential integrity, and as such have some impact on normalisation. Actually, fully normalized designs replace several types of constraint checking with foreign key referential integrity. The result of tacking 'association' onto the end 'foreign key' is meaningless nonsense invented in an ad hoc unprincipled manner.

Polymorphism is the name for exactly what you want to do, which is to create a facility capable of handling multiple types. Many types of polymorphism exist: e.g. overloading and type inheritance. What you are doing is a "poor man's polymorphism" or kludge to overcome SQL's lack of support for polymorphic columns.

> Statistics are normalized in so much that their existence
> is dependant on a labor entry (key). I guess the real issue is how
> the business rules were stated when the system was put together -
> getting statistics is dependant on labor and labor is the only way to
> generate a statistic - or it's impossible to read a meter without
> labor...

Whether the design you have is normalized will not depend on your method of cramming different types of statistics into a single type, but will depend on the full set of functional dependencies. For instance, if ActivityID functionally depends on CaseID, the design won't be fully normalized, but this has nothing to do with StatisticsAmoun or StatisticsTypeID

> > > Is
> > > there a better way to do this?
> >
> > Nobody can really truthfully and credibly answer this question in a
usenet
> > newsgroup. Whether a particular way is better will depend on the full
set of
> > business rules, which nobody on usenet would have time to properly
consider
> > even if you tried to list them.
> >
>
> I agree, but it seems to me that "foreign key association" is an issue
> that database design books almost never tackle

Nor should they. They should tackle referential integrity, candidate keys, foreign keys, domains, etc.

> - or even mention (like
> the temporal issue) and that maybe there was a recommended construct
> kicking around the newsgroups that I could not find.

Which temporal issue? There are many issues, and time is a complex subject. Relational theorists with three decades of experience have only recently started proposing satisfying solutions to the problems.

> > > Should a field be reserved for each
> > > different type of statistic?
> >
> > Perhaps, perhaps not. Another question you need to ask is: Should a
table be
> > reserved for each different type of statistic?
>
> Yes, creating a table for each statistic looked like the "right"
> normalization. But building a table for each statistic seems overly
> complicated compared to hosting just the amount datum from all
> different types of statistics in one field.

From what you say about the requirements for users inventing new statistics and disabling statistics, a separate table for each statistic is clearly wrong and would violate the principle of orthogonal design.

> On the other hand doing
> this seems odd. I intrinsically prefer to find the number of people
> in a People field and number of Alpha Particles in a AlphaParticles
> field - not both listed in the same field!

This is what derived relations like views and snapshots are for--presenting an underlying design in the manner most accessible to a user or an application. Received on Fri Apr 11 2003 - 03:52:21 CEST

Original text of this message