Re: Mixing OO and DB

From: Brian Selzer <brian_at_selzer-software.com>
Date: Sat, 22 Mar 2008 00:15:23 -0400
Message-ID: <wJ%Ej.7447$Rq1.17_at_nlpi068.nbdc.sbc.com>


"Patrick May" <pjm_at_spe.com> wrote in message news:m2k5jvohbo.fsf_at_spe.com...

> "Brian Selzer" <brian_at_selzer-software.com> writes:

>> "Patrick May" <pjm_at_spe.com> wrote in message
>> news:m2eja6belp.fsf_at_spe.com...
>>>>>     If you agree with this, the second half of your claim
>>>>> ". . . and it cannot be decoupled" is clearly incorrect because
>>>>> the application implementation deals with the physical schema.
>>>>> Since you agree that multiple different physical schemas are
>>>>> possible, decoupling the application from any particular set of
>>>>> those is both possible and good design.
>>>>
>>>> No, it isn't incorrect.  Whenever a schema evolves, views are
>>>> often used to provide backward compatibility for existing
>>>> applications.  An application need not concern itself with whether
>>>> it is accessing a view or a table; therefore it is not necessarily
>>>> tied to any physical schema.
>>>
>>>     The fact that views can be used actually demonstrates that the
>>> application can be decoupled from the schema.  You are suggesting
>>> using views to do so.  That's one possible mechanism.  OO languages
>>> provide others.
>>

>> You have a very narrow and limited view of what a schema is and what
>> it can provide.
>
>     You have a deft hand with non sequiturs.
>

One of my failings is being able to think ahead several moves. A schema specifies potential information content. By using projections and joins, partitioning restrictions and disjoint unions, that content can be presented as various sets of relations--each having exactly the same potential for information content. For example, a relation schema R{A, B, C} where {A} is the key has the same potential for information content as a set of two relation schemata, S{A, B} and T{A, C} constrained by a circular inclusion dependency S[A] = T[A], where {A} is the key of both S and T. What that means is that if you join S and T you get R, and if you take projections over R[A, B] and R[A, C] you get S and T respectively. So in this example, if you specify relation schema R, relation schemata S and T and S[A] = T[A] can be inferred, or if you specify S and T and S[A] = T[A], R can be inferred.

>>>     Even when views are used, the application should be decoupled
>>> from the schema because the two models are often very different.
>>> Applications can organize information in ways other than the
>>> relational model.
>>

>> I just don't buy this. If the information is the same, but just
>> organized differently, then there must exist a transposition between
>> them. Each is then just a different possible representation of the
>> same information.
>
>     Some representations are more expressive in terms of the problem
> or solution domain.  Tuples are not always the optimal data structure.
>

I'm not sure if we're on the same page as to what constitutes expressiveness.

>> If the transposition is done by the DBMS, then it can retain its
>> responsibility for guaranteeing integrity. If the transposition is
>> done by the application, then that responsibility may need to shift
>> from the DBMS to the application--every application. Now you have
>> to guarantee that the code that is used to access the information is
>> identical in every application that uses the information

>
>     If that is a requirement, it's a good argument for a shared
> mapping layer or other decoupling mechanism.  In fact, though,
> different applications often need different representations of
> different subsets of the data available in a relational database, plus
> data that is only used within the application.  Because the
> application has a different, non-relational model of the data,
> decoupling is good design.
>

And what model is that? Is OO a data model?

>> --AND, you have to prevent ad-hoc access to the data.

>
>     Why?  It's certainly easier to maintain the integrity of the
> database if you can, but many systems support multiple applications
> and ad-hoc interaction with the underlying database.  That's what
> locking and other concurrency techniques are for.
>

If information is held in the memory of some application and is also in the database, and if the copy in memory changes, then the copy in the database is stale, and any query against the database must be considered suspect. This is especially true during the time that information is being committed to the database, since the order in which it is committed may affect the results of other queries. So unless you're solution is to hold locks on every row referenced by every application (not the most palatable choice), you have to prevent ad-hoc access.

>> In addition, you have to manage concurrency in every application.

>
>     Yes.
>

>> Keep it simple, stupid: let the DBMS do what it is designed to do.
>> Why reinvent the wheel? The people who built the DBMS are probably
>> a lot smarter than you, or at least know more about how to persist
>> information than you, since that is their focus.
>
>     A relational database is a very generic technology.  An
> application is much more specific and can therefore take advantage of
> less general types and data structures that improve the performance
> and maintainability of the application code.  Except for CRUD systems,
> the database vendors can't address those problem domains in a generic
> way.
>

Until another application needs to use the data. It's a common problem among those who start out as programmers--to get focused on the details and therefore fail to see the big picture. It's a hard habit to break.

>>>>>     Further, the view that the schema is integral to the
>>>>> application is very data centric.  Different applications may
>>>>> need the data in different forms, not all of which are
>>>>> relational.  There is therefore a need to translate between the
>>>>> data structures, which is another good reason to decouple the
>>>>> application from the specific physical schema being used.
>>>>
>>>> If by data centric you mean that the information that is to be and
>>>> can be recorded must be specified before even considering how that
>>>> information may behave, then I agree: it is a data centric view.
>>>
>>>     It is also possible to define a system in terms of behavior and
>>> only decide on a particular data representation once those
>>> behaviors are designed.  In practice, both approaches are typically
>>> used.
>>

>> How can you possibly design a system in terms of the behavior of
>> objects if you haven't first specified which objects are
>> interesting?
>
>     You focus on the behaviors of interest and partition those
> behaviors into cohesive units of classes and modules.
>

Behaviors of what? Let's examine one behavior: barking. When applied to a dog, I can see in my mind's eye a mailman reaching for his pepper spray, but when applied to a person it brings to mind those nice white men in their nice white coats. So again I ask, how can you possibly design a system in terms of the behavior of objects if you haven't first specified which objects are interesting?

>>>> But again, you're laboring under the delusion that an application
>>>> must be tied to a physical schema.
>>>
>>>     I'm under no such delusion.  An application implementation that
>>> uses data from a relational databas must deal with a particular
>>> schema.  Since the two change for different reasons and since the
>>> models are often different, decoupling them is good design.
>>

>> Whenever there is a change in potential information content, that
>> change may involve potential information that an application can
>> access or manipulate, or potential information that an application
>> doesn't access.
>
>     Fallacy of the excluded middle.  The change may also be in how
> the information is modeled by either the application implementation or
> the specific schema being used by the application.  One option is to
> use views to isolate the two.  Another option is to decouple the two
> components (application implementation and specific schema) so that
> changes in one do not impact the other.
>

And sometimes my dog has fleas.

I don't think we're on the same page as to what constitutes a model, either.

>>>>>>>     Even if an application uses the database system's
>>>>>>> capabilities to implement some application functionality, there
>>>>>>> are still changes to the underlying schema that do not, or
>>>>>>> should not, require change to the applications that use that
>>>>>>> schema.  This is why approaches like dependency inversion are
>>>>>>> useful.  The application depends on an interface and the
>>>>>>> combination of database schema and any logic running in the
>>>>>>> database implements that interface.  Either can change without
>>>>>>> impacting the other.
>>>>>>
>>>>>> The database contains that which can be manipulated.  An
>>>>>> application is that which does the manipulating.  These are
>>>>>> completely different species of functionality.
>>>>>
>>>>>     The database sometimes contains some of that which can be
>>>>> manipulated.
>>>>
>>>> I don't follow you.  Please elaborate.
>>>
>>>     Not all data used by an application needs to be in the database.
>>

>> I thought that we were discussing information that is to be and can
>> be recorded. Such information needs to be in the database.
>
>     That depends on how long it needs to remain available and if it
> needs to be accessed by other clients of the database.  I often work
> on systems where a considerable portion of the information is stored
> in a distributed shared object repository, in memory.  You could
> consider that a form of database, but it doesn't use a relational
> model.
>

Good luck if you have a hardware or power failure!

> Sincerely,
>
> Patrick
>
> ------------------------------------------------------------------------
> S P Engineering, Inc.  | Large scale, mission-critical, distributed OO
>                       | systems design and implementation.
>          pjm_at_spe.com  | (C++, Java, Common Lisp, Jini, middleware, SOA) 
Received on Sat Mar 22 2008 - 05:15:23 CET

Original text of this message