Re: Is multi-type data in one field normal?

From: Mike <Star.Point_at_mcsci.net>
Date: 11 Apr 2003 03:59:36 -0700
Message-ID: <a90c0da6.0304110045.21bbd3b5_at_posting.google.com>


"Bob Badour" <bbadour_at_golden.net> wrote in message news:<0Vpla.253$Ck4.22670428_at_mantis.golden.net>...

Bob, thanks for taking the time to answer.

> "Mike" <Star.Point_at_mcsci.net> wrote in message
> news:a90c0da6.0304101254.7b97219a_at_posting.google.com...
> > "Bob Badour" <bbadour_at_golden.net> wrote in message news:<osbla.229
> >
> > Bob, I thank you for your wisdom and time in answering my questions.
> >
> > $DW2.20351002_at_mantis.golden.net>...
> > > "Mike" <Star.Point_at_mcsci.net> wrote in message
> > > news:a90c0da6.0304100109.581b21c4_at_posting.google.com...
> > > > I have been wrestling with a table design that stores different kinds
> > > > of statistic data in one field - and the type of statistic in another.
> > > > For Example:
> > > >
> > > > tblLabor
> > > > LaborID
> > > > UserID
> > > > CaseID
> > > > ActivityID
> > > > Date
> > > > Hours
> > > > *StatisticAmount*
> > > > *StatisticTypeID*
> > > > Notes
> > > >
> > > >
> > > > The "StatisticAmount" field is intrinsically linked to a
> > > > StatisticTypeID field since each number in the amount field is
> > > > qualified by the type field. If I want to find a sum for each type of
> > > > statistic I loop through the StatisticsTypes (in a tblStatisticsType
> > > > table), build a row set of tblLabor and sum the amount field for each
> > > > type. (There doesn't seem to be a single SQL statement to do the
> > > > summary.)
> > >
> > > Did you try:
> > >
> > > SELECT StatisticTypeID, SUM( StatisticAmount ) AS StatisticTotal
> > > FROM tblLabor
> > > GROUP BY StatisticTypeID
> > > ;
> >
> > Yes, I did try a select just like that. Problem is - statistics that
> > have no entries (i.e. no labor rows) do not get listed in the results.
> > Statistics with no entries are considered to be zero. I tried
> > various kinds of sub-quires with WHERE EXISTS, IS NULL etc with no
> > success. Looping seems like the only answer.
>
> I see the issue. You want all statistics listed regardless of existence in
> tblLabor. You will need to query tblStatisticsType to get the full list of
> StatisticTypes.
>
> SELECT StatisticTypeID
> , ( SELECT COUNT(*)
> FROM tblLabor lab
> WHERE lab.StatisticTypeID = stat.StatisticTypeID
> ) As StatisticTotal
> FROM tblStatisticType stat
> WHERE Enabled
> ;
>
> If your dbms does not allow the scalar subquery in the select list, you can
> write an outer join, but then you have the whole NULL mess to deal with.
>

Your scalar subquery works! I changed COUNT to SUM for my purpose and I get a null value for the stats that don't have labor - exactly as it should. Thank you!

>
> > > > This "multi-type data" seems like an easy and functional way to store
> > > > a large number of different types of statistics. It works for the
> > > > most part as long as all statistics are of the same data type and/or
> > > > can fit into a compatible data type (i.e. put ints and floats into a
> > > > float). Some statistics might be small numbers (.000023) and others
> > > > large (23,456,222), some with decimal points others without
> > > >
> > > > The questions are: What is a more *normalized* way to do this?
> > >
> > > That depends. Are you writing an application that allows users to define
> the
> > > statistics they gather? Or are you writing an application that gathers a
> > > couple of unrelated statistics?
> >
> > Yes, we allow users to add new statistics (and disable statistics) if
> > they want.
>
> Then it seems to me that you need to model a generic statistics type
> encompassing all the statistics, and the solution you proposed originally
> could fit the bill. Again with the caveat that the full design will require
> full knowledge of the business rules.
>
>
> > Adding a row to tblStatisticType adds a statistic name and
> > setting a disabled bit on a row disables it.
> >
> > There are about 50
> > active statistics right now but this number could be 150 or more in
> > the future. Also, all "enabled" statistics are listed on all case
> > reports (statistics that have no labor entries are shown as zero - I
> > know, statistics without labor entries really should be listed as
> > unknown or N/A, others really make sense to be listed as zero, but I
> > don't make the rules. Just recommendations...)
>
> Have you considered a design with a tblEnabledStatisticType with a single
> StatisticsTypeID column identifying the enabled statistic types? It would of
> course have a foreign key reference to tblStatisticsType. Inserting a
> statistics type would enable it and deleting a statistics type would disable
> it. Not that I think it is necessarily better or worse than the design you
> have, but it is an option to consider.
>

No, I did not consider a tblEnabledStatisticType. Putting a Boolean column as a record "enable" bit or having a status column works just fine.

>
> > > > Is it
> > > > normalized to store different types of data in a single field?
> > >
> > > One would prefer to have polymorphism and subtypes for this, but the
> issue
> > > is independent of normalization, which relates to decomposition by the
> > > project operation and recomposition by the join operation.
> > >
> >
> > The word "polymorphism" gave me some pointers to track on this issue.
> > I found that "foreign key association" might be a good name for this
> > odd method.
>
> While it may be necessary to learn the new jargon for object oriented
> programming (ie. 'object pattern names') if you need to communicate with
> teams of programmers who have jumped on this latest bandwagon, I suggest you
> first learn to think using sound theory backed concepts and terminology.
> Foreign keys are a relational concept involved with referential integrity,
> and as such have some impact on normalisation. Actually, fully normalized
> designs replace several types of constraint checking with foreign key
> referential integrity. The result of tacking 'association' onto the end
> 'foreign key' is meaningless nonsense invented in an ad hoc unprincipled
> manner.
>
> Polymorphism is the name for exactly what you want to do, which is to create
> a facility capable of handling multiple types. Many types of polymorphism
> exist: e.g. overloading and type inheritance. What you are doing is a "poor
> man's polymorphism" or kludge to overcome SQL's lack of support for
> polymorphic columns.
>

I found "foreign key association" here: http://www.objectarchitects.de/ObjectArchitects/orpatterns/index.htm And here:
http://www.objectarchitects.de/ObjectArchitects/orpatterns/MappingObjects2Tables/ForeignKeyAssociation/

But, I really don't care what the method is named – "multi-typed" works for me. I also like "union type". Polymorphism is ok to...

>
> > Statistics are normalized in so much that their existence
> > is dependant on a labor entry (key). I guess the real issue is how
> > the business rules were stated when the system was put together -
> > getting statistics is dependant on labor and labor is the only way to
> > generate a statistic - or it's impossible to read a meter without
> > labor...
>
> Whether the design you have is normalized will not depend on your method of
> cramming different types of statistics into a single type, but will depend
> on the full set of functional dependencies. For instance, if ActivityID
> functionally depends on CaseID, the design won't be fully normalized, but
> this has nothing to do with StatisticsAmoun or StatisticsTypeID
>
>
> > > > Is
> > > > there a better way to do this?
> > >
> > > Nobody can really truthfully and credibly answer this question in a
> usenet
> > > newsgroup. Whether a particular way is better will depend on the full
> set of
> > > business rules, which nobody on usenet would have time to properly
> consider
> > > even if you tried to list them.
> > >
> >
> > I agree, but it seems to me that "foreign key association" is an issue
> > that database design books almost never tackle
>
> Nor should they. They should tackle referential integrity, candidate keys,
> foreign keys, domains, etc.
>
>
> > - or even mention (like
> > the temporal issue) and that maybe there was a recommended construct
> > kicking around the newsgroups that I could not find.
>
> Which temporal issue? There are many issues, and time is a complex subject.
> Relational theorists with three decades of experience have only recently
> started proposing satisfying solutions to the problems.
>

I was referring generally to all temporal issues with SQL RDBMS's that invariable get left out of books on these relational databases. Normalization is covered, the ins and outs of building a schema, SQL statements etc. But what about saving the history of customer addressed as they change over time? Many SQL RDBMS books simply have no words on this issue.

>
> > > > Should a field be reserved for each
> > > > different type of statistic?
> > >
> > > Perhaps, perhaps not. Another question you need to ask is: Should a
> table be
> > > reserved for each different type of statistic?
> >
> > Yes, creating a table for each statistic looked like the "right"
> > normalization. But building a table for each statistic seems overly
> > complicated compared to hosting just the amount datum from all
> > different types of statistics in one field.
>
> From what you say about the requirements for users inventing new statistics
> and disabling statistics, a separate table for each statistic is clearly
> wrong and would violate the principle of orthogonal design.
>
>
> > On the other hand doing
> > this seems odd. I intrinsically prefer to find the number of people
> > in a People field and number of Alpha Particles in a AlphaParticles
> > field - not both listed in the same field!
>
> This is what derived relations like views and snapshots are for--presenting
> an underlying design in the manner most accessible to a user or an
> application.

Cool comments, thanks again for crafting that SQL statement, Mike Received on Fri Apr 11 2003 - 12:59:36 CEST

Original text of this message