Re: Mixing OO and DB

From: Brian Selzer <brian_at_selzer-software.com>
Date: Mon, 10 Mar 2008 02:08:13 GMT
Message-ID: <hK0Bj.22574$R84.5776_at_newssvr25.news.prodigy.net>


"Patrick May" <pjm_at_spe.com> wrote in message news:m28x0riv27.fsf_at_spe.com...

> "Brian Selzer" <brian_at_selzer-software.com> writes:

>> "Patrick May" <pjm_at_spe.com> wrote in message
>> news:m2y78vfcbv.fsf_at_spe.com...
>>>>>>>     One common denormalization technique is to eliminate a join
>>>>>>> by adding columns to one table that replicate data in another
>>>>>>> table that has a one-to-many relationship with the table being
>>>>>>> denormalized.  The business logic shouldn't have to change just
>>>>>>> because the schema does.
>>>>>>
>>>>>> Why denormalize when you could just use a materialized view?
>>>>>
>>>>>     Materialized views are not always available and may not provide
>>>>> significant performance benefits if the data is being updated
>>>>> frequently and the clients need the absolute latest values.
>>>>
>>>> How is that different from denormalization?  Updates still have to
>>>> be made in a denormalized schema.
>>>
>>>     With a materialized view, the updates to the underlying tables
>>> need to be propagated to the view at some point.  If updates are
>>> frequent and the clients of the materialized view need the latest
>>> values, this is an extra step.
>>>
>>

>> (Didn't I just say something to that effect? Oh, you snipped it.)
>
>     Nah, I just replied to it below.  Confusing layout on my part,
> mea culpa.
>

>> In the sense that the number of steps increases, yes there is an
>> extra step; in the sense that more work must be performed, no, not
>> really. The workload is simply divided into that which must be
>> performed to update the base table (which is less and can be
>> significantly less than that for a denormalized schema) and that
>> which must be performed to refresh the view.
>>
>>>> I guess it could be that due to the frequency of updates the
>>>> materialized view could be stale more often than not: in that case
>>>> there wouldn't be any point in materializing it.
>>>
>>>     Exactly.
>>>
>>>> But I don't see how a denormalized schema would work any better,
>>>> since much of the time queries would be waiting for update
>>>> transactions to complete.
>>>
>>>     You'd save the time required to update the view.  Depending on
>>> the granularity of the update, you could get some performance
>>> improvement when the query is for rows not currently locked under a
>>> transaction.
>>

>> No, you wouldn't save any time, since the same work must be
>> performed whether you're refreshing the view or updating the
>> denormalized base tables. In fact, since the task of refreshing the
>> view can be separate, you may actually lose time waiting for updates
>> to the denormalized base tables to complete.
>
>     That implies that the materialized view could be updated before
> the underlying tables.  What happens in that situation if the update
> to the underlying tables fails for some reason?  The best you can hope
> for is to have enough excess CPU capacity to allow the changes to the
> materialized view and to the underlying tables to commit
> simultaneously.
>

I don't think that that is implied.

>     If the materialized view is updated with a batch size greater
> than one transaction against the underlying tables, an application
> accessing the tables can get the updates before one accessing the
> view.
>
>     In any case, we're getting a little far afield from the original
> question.  In enterprise systems, denormalization for performance does
> take place.  This is just one of several reasons for decoupling the
> application logic from the database schema.
>

I don't agree with this. You're equating the database schema with the database implementation. The schema specifies what information is to be and can be recorded. As such the schema is an integral part of the application specification, and it cannot be decoupled, but that doesn't mean that the database implementation cannot. The schema does not specify how information is physically recorded, nor does it specify the process by which the recording takes place. That belongs to the implementation, and that can certainly be decoupled and probably should be.

>> Denormalization causes problems beyond the additional code required
>> to maintain consistency, and the need to duplicate that code every
>> time an update occurs.

>
>     I'm not arguing against the principle that denormalization should
> be avoided if at all possible.
>
> Regards,
>
> Patrick
>
> ------------------------------------------------------------------------
> S P Engineering, Inc.  | Large scale, mission-critical, distributed OO
>                       | systems design and implementation.
>          pjm_at_spe.com  | (C++, Java, Common Lisp, Jini, middleware, SOA) 
Received on Mon Mar 10 2008 - 03:08:13 CET

Original text of this message