Re: Client/Server query strategy

From: Bill Meahan <wmeahan_at_ef0424.efhd.ford.com>
Date: Tue, 07 Mar 95 13:21:02 GMT
Message-ID: <3jhmk0$67b_at_ef2007.efhd.ford.com>


nraden_at_aol.com (NRaden) wrote:
>wmeahan_at_ef0424.efhd.ford.com (Bill Meahan) writes:
>
>>If you have a large number of users, and have common
>>queries that might result in a common result set over an
>>extended time period (say the data only changes once a
>>day or once a "shift") you might even want to consider
>>"precalculating" the common result set immediately after
>>the data has been updated and placing THAT where the
>>uses can get at it. Yes, I know you're not supposed to
>>have calculated/derived data stored in the database but under
 many real-world circumstances, you have to do what you
>>have to do to prevent having the system be unusably slow.
>
>Where is the rule that says pre-calculated data in a database in a no-no?
>We do it all of the time in the data warehouses we build. It is not only a
>good idea, it is usually the only practical way to get the performance
>needed.
>

Codd & Date and all derivatives and antecedents STRONGLY argue against keeping derived data in the database. The argument is twofold:

  1. The derived data is redundant. That is, if you have data elements A, B and C, you can always calculate any function that depends on A, B and C when it's needed without incurring the storage necessary to maintain the function result.
  2. If the data is not stable or bounded, there is a high potential for data integrity problems. Say that you calculate the sum of the values in a particular column and store the value but rows are still being added/deleted/updated to the table. The sum you calculate will cease to match the current value of the sum the moment anything changes.

This does NOT mean one should _never_ save derived data, just that one must **thoroughly** understand the ramifications of calculating and storing the derived data. On the other hand, the fact that "foo" stores derived data (whether "foo" is a person, oraganization or product) does not, a priori, make it "right" either. Again, one MUST understand the ramifications.

An illustration of a proper use of stored derived data is "rollups." Data from our production machinery, for example, is sampled every few minutes so that we can note variations in performance (scrap rates, throughput efficiency etc.)over a short time and correlate these variations to various other events. However, this "fine-grained" observation is only useful for a limited time - after a few weeks things have changed enough that the minute variations are no longer effective in performance prediction or troubleshooting. However, at some coarser level (the "shift") there is still value. It is proper to then consolidate all the data for a particular shift into a single record and to delete the individual observations. There is no redundant storage since the original "old" data has been deleted. Similarly, there is no data integrity problem since the data was purely historical and not subject to change. However, the latter is only true after some period of time to make sure that there are no "lost records" discovered or corrections to compensate for collection problems made.

Hope this helps.

--
Bill Meahan,  Senior Developer  |        wmeahan_at_ef0424.efhd.ford.com
Electrical & Fuel Handling Division, Ford Motor Company
Opions expressed herein are those of the author and in no way represent
any official statement or opinion of Ford Motor Company
Received on Tue Mar 07 1995 - 14:21:02 CET

Original text of this message