Re: Mixing OO and DB

From: Brian Selzer <brian_at_selzer-software.com>
Date: Tue, 25 Mar 2008 03:23:58 GMT
Message-ID: <if_Fj.30093$R84.5229_at_newssvr25.news.prodigy.net>


"Patrick May" <pjm_at_spe.com> wrote in message news:m2bq534qzb.fsf_at_spe.com...

> "Brian Selzer" <brian_at_selzer-software.com> writes:

>> "Patrick May" <pjm_at_spe.com> wrote in message
>> news:m2k5jvohbo.fsf_at_spe.com...
>>>>>     The fact that views can be used actually demonstrates that
>>>>> the application can be decoupled from the schema.  You are
>>>>> suggesting using views to do so.  That's one possible mechanism.
>>>>> OO languages provide others.
>>>>
>>>> You have a very narrow and limited view of what a schema is and
>>>> what it can provide.
>>>
>>>     You have a deft hand with non sequiturs.
>>>
>>

>> One of my failings is being able to think ahead several moves.
>
>     Yes, yes, and your biggest deficiency as an employee is that you
> care too much and work too hard.  I realize some of the posing on
> c.d.t. might be infectious, but here in comp.object we can have grown
> up conversations (our resident troll excepted, of course).
>

Wow! Where did that come from? I did not intend to offend. In hindsight, I guess it could have been construed that way. People sometimes find me hard to follow because I don't always convey my entire thought process, often jumping ahead and leaving out critical details.

>> A schema specifies potential information content. By using
>> projections and joins, partitioning restrictions and disjoint
>> unions, that content can be presented as various sets of
>> relations--each having exactly the same potential for information
>> content. For example, a relation schema R{A, B, C} where {A} is the
>> key has the same potential for information content as a set of two
>> relation schemata, S{A, B} and T{A, C} constrained by a circular
>> inclusion dependency S[A] = T[A], where {A} is the key of both S and
>> T. What that means is that if you join S and T you get R, and if
>> you take projections over R[A, B] and R[A, C] you get S and T
>> respectively. So in this example, if you specify relation schema R,
>> relation schemata S and T and S[A] = T[A] can be inferred, or if you
>> specify S and T and S[A] = T[A], R can be inferred.

>
>     I don't disagree, but it's still a non sequitur.  You claimed
> that it isn't possible to decouple the application implementation from
> the specific schema.  That is clearly incorrect because the same
> internal representation used by the application can be supported by
> more than one specific database schema, as you describe here.  If the
> specific schema is encapsulated such that the application is decoupled
> from it, you can change the specific schema without impact to the
> application.
>

Not exactly. I claimed that it isn't possible to decouple the application from the schema. I believe I said, and I think you agreed, that a schema specifies what is to be and can be recorded, and it is in that sense that it cannot be separated from the application.

>>>>>     Even when views are used, the application should be decoupled
>>>>> from the schema because the two models are often very different.
>>>>> Applications can organize information in ways other than the
>>>>> relational model.
>>>>
>>>> I just don't buy this.  If the information is the same, but just
>>>> organized differently, then there must exist a transposition
>>>> between them.  Each is then just a different possible
>>>> representation of the same information.
>>>
>>>     Some representations are more expressive in terms of the
>>> problem or solution domain.  Tuples are not always the optimal data
>>> structure.
>>

>> I'm not sure if we're on the same page as to what constitutes
>> expressiveness.
>
>     It comes down to the fact that some solutions are more easily
> implemented using structures other than tuples.
>
>>>> If the transposition is done by the DBMS, then it can retain its
>>>> responsibility for guaranteeing integrity.  If the transposition
>>>> is done by the application, then that responsibility may need to
>>>> shift from the DBMS to the application--every application.  Now
>>>> you have to guarantee that the code that is used to access the
>>>> information is identical in every application that uses the
>>>> information
>>>
>>>     If that is a requirement, it's a good argument for a shared
>>> mapping layer or other decoupling mechanism.  In fact, though,
>>> different applications often need different representations of
>>> different subsets of the data available in a relational database,
>>> plus data that is only used within the application.  Because the
>>> application has a different, non-relational model of the data,
>>> decoupling is good design.
>>

>> And what model is that? Is OO a data model?
>
>     It could be anything from a simple stack to a DAG to a full
> object graph.  Internally, the application isn't often using tuples.
>

Data structures are not data models.

>>>> --AND, you have to prevent ad-hoc access to the data.
>>>
>>>     Why?  It's certainly easier to maintain the integrity of the
>>> database if you can, but many systems support multiple applications
>>> and ad-hoc interaction with the underlying database.  That's what
>>> locking and other concurrency techniques are for.
>>

>> If information is held in the memory of some application and is also
>> in the database, and if the copy in memory changes, then the copy in
>> the database is stale, and any query against the database must be
>> considered suspect.
>
>     True.  This is a standard problem in large distributed systems.
> There are many techniques for dealing with it that don't require a
> centralized database.  That's not to say that a centralized database
> is never a good solution, it's just not always the best solution.
>

So what does that have to do with allowing ad-hoc access?

>>>     A relational database is a very generic technology.  An
>>> application is much more specific and can therefore take advantage
>>> of less general types and data structures that improve the
>>> performance and maintainability of the application code.  Except
>>> for CRUD systems, the database vendors can't address those problem
>>> domains in a generic way.
>>

>> Until another application needs to use the data. It's a common
>> problem among those who start out as programmers--to get focused on
>> the details and therefore fail to see the big picture. It's a hard
>> habit to break.
>
>     There's that snide c.d.t. attitude again.  There are more ways
> to build distributed systems than with relational databases, and those
> of us who do it tend to manage the big picture just fine, thank you
> very much.

You missed my point altogether: an application can take advantage of less general types and data structures that improve the performance and maintainability of the code until another application needs to use the data. Then the performance and maintanability improvements will very likely go right out the window.

>
>     When your snark is removed, I note that you have failed to
> address the point I made.
>

I believe my point clearly addresses the point you made.

>>>>>> If by data centric you mean that the information that is to be
>>>>>> and can be recorded must be specified before even considering
>>>>>> how that information may behave, then I agree: it is a data
>>>>>> centric view.
>>>>>
>>>>>     It is also possible to define a system in terms of behavior
>>>>> and only decide on a particular data representation once those
>>>>> behaviors are designed.  In practice, both approaches are
>>>>> typically used.
>>>>
>>>> How can you possibly design a system in terms of the behavior of
>>>> objects if you haven't first specified which objects are
>>>> interesting?
>>>
>>>     You focus on the behaviors of interest and partition those
>>> behaviors into cohesive units of classes and modules.
>>

>> Behaviors of what? Let's examine one behavior: barking. When
>> applied to a dog, I can see in my mind's eye a mailman reaching for
>> his pepper spray, but when applied to a person it brings to mind
>> those nice white men in their nice white coats. So again I ask, how
>> can you possibly design a system in terms of the behavior of objects
>> if you haven't first specified which objects are interesting?
>
>     One certainly usually starts with some candidate classes, but as
> the interactions are identified it is not uncommon for the behaviors
> to migrate and the names of those candidate classes to change to
> better reflect their nature.
>
>     In any case, the focus is on behavior, not data.
>
>>>> Whenever there is a change in potential information content, that
>>>> change may involve potential information that an application can
>>>> access or manipulate, or potential information that an application
>>>> doesn't access.
>>>
>>>     Fallacy of the excluded middle.  The change may also be in how
>>> the information is modeled by either the application implementation
>>> or the specific schema being used by the application.  One option
>>> is to use views to isolate the two.  Another option is to decouple
>>> the two components (application implementation and specific schema)
>>> so that changes in one do not impact the other.
>>

>> And sometimes my dog has fleas.
>
>     Another non sequitur.
>

As was what immediately preceeded it: information can be represented in many different ways yet still be the same information. What is fallacious is trying to argue that a change in structure constitutes a change in potential information content.

>> I don't think we're on the same page as to what constitutes a model,
>> either.

>
>     Have you never developed an application that used data structures
> other than relations?
>

Indeed I have. But data structures are not data models. Structure is only one component of a data model. An even more important component is a set of constraints that specifies what states and changes of state are possible. Much of the time the constraints actually determine the structure--for example, from a set of functional dependencies, a normalized relational database schema can be inferred.

>>>>>     Not all data used by an application needs to be in the
>>>>>     database.
>>>>
>>>> I thought that we were discussing information that is to be and
>>>> can be recorded.  Such information needs to be in the database.
>>>
>>>     That depends on how long it needs to remain available and if it
>>> needs to be accessed by other clients of the database.  I often
>>> work on systems where a considerable portion of the information is
>>> stored in a distributed shared object repository, in memory.  You
>>> could consider that a form of database, but it doesn't use a
>>> relational model.
>>

>> Good luck if you have a hardware or power failure!
>
>     Each node of the distributed repository is backed up to one or
> more other machines, synchronously.  The whole cluster (actually, the
> minimal set of information required to recreate it) is backed up over
> a WAN to another data center for disaster recovery, typically
> asynchronoously.  UPSs supply enough time to guarantee successful
> failover.  And yes, there are occasional writes to a relational
> database, although those are kept out of the critical path of the
> business transactions.
>
>     This architecture can guarantee whatever level of reliability is
> required by the system.
>
> Sincerely,
>
> Patrick
>
> ------------------------------------------------------------------------
> S P Engineering, Inc.  | Large scale, mission-critical, distributed OO
>                       | systems design and implementation.
>          pjm_at_spe.com  | (C++, Java, Common Lisp, Jini, middleware, SOA) 
Received on Tue Mar 25 2008 - 04:23:58 CET

Original text of this message