Re: Mixing OO and DB

From: Brian Selzer <brian_at_selzer-software.com>
Date: Fri, 7 Mar 2008 11:32:10 -0500
Message-ID: <g6eAj.5991$pl4.3659_at_newssvr22.news.prodigy.net>


"Patrick May" <pjm_at_spe.com> wrote in message news:m2y78vfcbv.fsf_at_spe.com...

> "Brian Selzer" <brian_at_selzer-software.com> writes:

>> "Patrick May" <pjm_at_spe.com> wrote in message
>> news:m263w2kwrc.fsf_at_spe.com...
>>>>>     One common denormalization technique is to eliminate a join
>>>>> by adding columns to one table that replicate data in another
>>>>> table that has a one-to-many relationship with the table being
>>>>> denormalized.  The business logic shouldn't have to change just
>>>>> because the schema does.
>>>>
>>>> Why denormalize when you could just use a materialized view?
>>>
>>>     Materialized views are not always available and may not provide
>>> significant performance benefits if the data is being updated
>>> frequently and the clients need the absolute latest values.
>>

>> How is that different from denormalization? Updates still have to
>> be made in a denormalized schema.
>
>     With a materialized view, the updates to the underlying tables
> need to be propagated to the view at some point.  If updates are
> frequent and the clients of the materialized view need the latest
> values, this is an extra step.
>

(Didn't I just say something to that effect? Oh, you snipped it.) In the sense that the number of steps increases, yes there is an extra step; in the sense that more work must be performed, no, not really. The workload is simply divided into that which must be performed to update the base table (which is less and can be significantly less than that for a denormalized schema) and that which must be performed to refresh the view.

>> I guess it could be that due to the frequency of updates the
>> materialized view could be stale more often than not: in that case
>> there wouldn't be any point in materializing it.

>
>     Exactly.
>

>> But I don't see how a denormalized schema would work any better,
>> since much of the time queries would be waiting for update
>> transactions to complete.
>
>     You'd save the time required to update the view.  Depending on
> the granularity of the update, you could get some performance
> improvement when the query is for rows not currently locked under a
> transaction.
>

No, you wouldn't save any time, since the same work must be performed whether you're refreshing the view or updating the denormalized base tables. In fact, since the task of refreshing the view can be separate, you may actually lose time waiting for updates to the denormalized base tables to complete.

In addition, it may not even be necessary to materialize the view. If there are covering indexes on the referenced and referencing tables participating in a foreign key constraint, then a pair of partial index scans may perform better than a single partial scan over a denormalized table just because there are fewer reads required, and even greater performance can be had if the indexes are located on different physical media.

Denormalization causes problems beyond the additional code required to maintain consistency, and the need to duplicate that code every time an update occurs.

The most serious is that denormalization restricts the kinds of information that can be inferred from the content in the database because it plays havoc with the assumptions under which a relational database operates. In particular, it directly affects the Domain Closure assumption. As a result, simple queries such as "Is there an X?" may return incorrect results. For example, in a denormalized database with a single relation {A, B, C} that satisfies the functional dependencies, A --> B and B --> C, it is no longer possible to record values for C and B without also recording a value for A. As a result, a query, "Is there a {B, C} combination such that B = 5 and C = 3?" can no longer be considered deterministic, because there may be such a {B, C} combination that isn't associated with a value for A, but since there's no place to put that information in the database, there is no way to be certain. So at best you're answers to the query can only be either 'yes' or 'maybe' instead of either 'yes' or 'no.'

Another less serious problem is that denormalization introduces query bias--meaning, as you probably already know, that the performance of a few queries improves while at the same time the performance of the rest degrades. Received on Fri Mar 07 2008 - 17:32:10 CET

Original text of this message