Re: Calculated value dilemma

From: Brian Selzer <brian_at_selzer-software.com>
Date: Thu, 17 Jul 2008 23:33:30 -0400
Message-ID: <eaUfk.18279$N87.3448_at_nlpi068.nbdc.sbc.com>


"JOG" <jog_at_cs.nott.ac.uk> wrote in message news:81be038f-9b25-4c63-b461-9f04d8607daf_at_e39g2000hsf.googlegroups.com...
> On Jul 16, 3:00 pm, Rainboy <mark.rain..._at_gmail.com> wrote:
>> On Jul 16, 1:13 pm, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:
>>
>>
>>
>> > Rainboy wrote:
>> > > Hi all
>>
>> > > I am designing a database for my charity (we are a local special
>> > > needs
>> > > charity) to replace our existing one, and my main goals are to make
>> > > the database accurate, easy to use, and able to expand (the current
>> > > database is not!).
>>
>> > > The database will hold contact and membership details for about 300
>> > > families, staff/volunteers and other contacts, along with event
>> > > booking (we hold about 5 a week) and payment details.
>>
>> > > So, for each family we need to see how much they owe, plus a full
>> > > list
>> > > of all their transactions. I am considering having a transaction
>> > > table, which details every payment anyone has ever made.
>>
>> > > IDEA 1
>> > > To follow normalisation rules I should avoid having calculated values
>> > > in the table, and I should therefore simply calculate the balance
>> > > whenever I need it by querying the table. I guess as the database
>> > > grows I would have to periodically 'consolidate' the balance to
>> > > maintain efficiency.
>>
>> > Where did you learn that was part of a normalization rule? It's just
>> > plain wrong.
>>
>> > > However, I see each updated balance as a snapshot in time - when I
>> > > look back to figure out 'who paid what when', I want to be 100%
>> > > confident that the database will return the same balance it would
>> > > have
>> > > returned at the time. If a rogue entry were to somehow find its way
>> > > into the table, it would mess all the balances up...
>>
>> > Historical data should accurately reflect the history. What if a
>> > calculation changes? What if a calculation depends on the instantaneous
>> > value of some attribute where no history is stored? As a hypothetical
>> > situation, suppose families with 5 or more kids get a special discount.
>> > When baby #5 is born, the family doesn't suddenly get a discount on all
>> > their past transactions.
>>
>> > > IDEA 2
>> > > As keeping historically accurate info is important, 'hard-coding' the
>> > > balance in the table seems to me like it might be the most
>> > > appropriate
>> > > solution, even though it breaks the normalisation rules.
>>
>> > Exactly which normal form do you suppose it violates?
>>
>> > > Any suggestions would be greatly appreciated!
>>
>> > > Mark
>>
>> > > PS I've never designed a database before, so please let me know if
>> > > I'm completely barking up the wrong tree!
>>
>> > I think you have been misinformed. Where are you getting your
>> > information?- Hide quoted text -
>>
>> > - Show quoted text -
>>
>> Hmmmm...
>>
>> Well, I had assumed that the table would be violating 3NF because the
>> calculated value of the balance would be dependent on the transaction
>> amount, which is not the primary key.

The balance, while dependent upon the transaction amount, is not functionally dependent upon it, so the dependency has absolutely nothing to do with 3NF, or any of the classic normal forms for that matter--including 5NF.

>
> 3NF is deprecated really. You should start at BCNF when normalizing.
> Regards, J.
>

BCNF is 3NF or at least what 3NF should have been to begin with.

>>
>> Ah, I think I'm starting to understand... is historical data treated
>> differently because as soon as an entry is 'in the past', the balance
>> (at that time) no longer depends on the transaction amount, and it
>> does only depend on the primary key of transaction ID. So I just have
>> a balance column, and whenever I want someone's current balance, I
>> just query this historical table for the most recent entry? And I
>> won't be breaking any rules? Smooth.
>>
>> Although (assuming I've got this right) something does intuitively
>> bother me about this... can't quite put my finger on it! I feel a bit
>> wary that the database's accuracy will depend on someone always
>> entering in the correct balance to square with the latest transaction
>> amount... what if they don't? I know it can be automated, but I don't
>> feel that comfy knowing that the database structure doesn't take care
>> of it, and it'll have to be some extra logic. Does this make sense?
>>

You are justified in your concern. The problem with storing both an aggregate and the values that combine to form the aggregate is not due to a lack of normalization, but is a design flaw nonetheless. Bob's response was woefully inadequate in answering your question. Storing both the result of an aggregate and the values that combine to yield the aggregate result is generally not a good idea, unless what you're storing is qualified by a date/time stamp or some other temporal device, such as an open or closed interval. While there is still a link between the transaction amounts and the balance, that link is frozen in time--that is, tied to a particular point in time (even if that point is indeterminate, as is the case for an open interval). Another downside of storing an unqualified aggregate result is that in order to verify its value, you need to retain all of the transaction amounts that when combined yield the aggregate result. That may be five or ten years worth of transactions, and that's a problem if you are only retaining the data for the last year.

>> Mark
>>
>> PS Thanks for your reply. I hope you don't mind bearing with me
>> while I get my head around these new concepts!
>
Received on Fri Jul 18 2008 - 05:33:30 CEST

Original text of this message