Re: Mixing OO and DB

From: Patrick May <pjm_at_spe.com>
Date: Mon, 24 Mar 2008 19:55:20 -0400
Message-ID: <m2bq534qzb.fsf_at_spe.com>


"Brian Selzer" <brian_at_selzer-software.com> writes:

> "Patrick May" <pjm_at_spe.com> wrote in message news:m2k5jvohbo.fsf_at_spe.com...

>>>> The fact that views can be used actually demonstrates that
>>>> the application can be decoupled from the schema. You are
>>>> suggesting using views to do so. That's one possible mechanism.
>>>> OO languages provide others.
>>>
>>> You have a very narrow and limited view of what a schema is and
>>> what it can provide.
>>
>> You have a deft hand with non sequiturs.
>>
>
> One of my failings is being able to think ahead several moves.

     Yes, yes, and your biggest deficiency as an employee is that you care too much and work too hard. I realize some of the posing on c.d.t. might be infectious, but here in comp.object we can have grown up conversations (our resident troll excepted, of course).

> A schema specifies potential information content.  By using
> projections and joins, partitioning restrictions and disjoint
> unions, that content can be presented as various sets of
> relations--each having exactly the same potential for information
> content.  For example, a relation schema R{A, B, C} where {A} is the
> key has the same potential for information content as a set of two
> relation schemata, S{A, B} and T{A, C} constrained by a circular
> inclusion dependency S[A] = T[A], where {A} is the key of both S and
> T.  What that means is that if you join S and T you get R, and if
> you take projections over R[A, B] and R[A, C] you get S and T
> respectively.  So in this example, if you specify relation schema R,
> relation schemata S and T and S[A] = T[A] can be inferred, or if you
> specify S and T and S[A] = T[A], R can be inferred.

     I don't disagree, but it's still a non sequitur.  You claimed
that it isn't possible to decouple the application implementation from the specific schema. That is clearly incorrect because the same internal representation used by the application can be supported by more than one specific database schema, as you describe here. If the specific schema is encapsulated such that the application is decoupled from it, you can change the specific schema without impact to the application.

>>>> Even when views are used, the application should be decoupled
>>>> from the schema because the two models are often very different.
>>>> Applications can organize information in ways other than the
>>>> relational model.
>>>
>>> I just don't buy this. If the information is the same, but just
>>> organized differently, then there must exist a transposition
>>> between them. Each is then just a different possible
>>> representation of the same information.
>>
>> Some representations are more expressive in terms of the
>> problem or solution domain. Tuples are not always the optimal data
>> structure.

>
> I'm not sure if we're on the same page as to what constitutes
> expressiveness.

     It comes down to the fact that some solutions are more easily
implemented using structures other than tuples.

>>> If the transposition is done by the DBMS, then it can retain its
>>> responsibility for guaranteeing integrity. If the transposition
>>> is done by the application, then that responsibility may need to
>>> shift from the DBMS to the application--every application. Now
>>> you have to guarantee that the code that is used to access the
>>> information is identical in every application that uses the
>>> information
>>
>> If that is a requirement, it's a good argument for a shared
>> mapping layer or other decoupling mechanism. In fact, though,
>> different applications often need different representations of
>> different subsets of the data available in a relational database,
>> plus data that is only used within the application. Because the
>> application has a different, non-relational model of the data,
>> decoupling is good design.
>
> And what model is that? Is OO a data model?

     It could be anything from a simple stack to a DAG to a full object graph. Internally, the application isn't often using tuples.

>>> --AND, you have to prevent ad-hoc access to the data.
>>
>> Why? It's certainly easier to maintain the integrity of the
>> database if you can, but many systems support multiple applications
>> and ad-hoc interaction with the underlying database. That's what
>> locking and other concurrency techniques are for.

>
> If information is held in the memory of some application and is also
> in the database, and if the copy in memory changes, then the copy in
> the database is stale, and any query against the database must be
> considered suspect.

     True.  This is a standard problem in large distributed systems.
There are many techniques for dealing with it that don't require a centralized database. That's not to say that a centralized database is never a good solution, it's just not always the best solution.

>> A relational database is a very generic technology. An
>> application is much more specific and can therefore take advantage
>> of less general types and data structures that improve the
>> performance and maintainability of the application code. Except
>> for CRUD systems, the database vendors can't address those problem
>> domains in a generic way.

>
> Until another application needs to use the data.  It's a common
> problem among those who start out as programmers--to get focused on
> the details and therefore fail to see the big picture.  It's a hard
> habit to break.

     There's that snide c.d.t. attitude again.  There are more ways
to build distributed systems than with relational databases, and those of us who do it tend to manage the big picture just fine, thank you very much.

     When your snark is removed, I note that you have failed to address the point I made.

>>>>> If by data centric you mean that the information that is to be
>>>>> and can be recorded must be specified before even considering
>>>>> how that information may behave, then I agree: it is a data
>>>>> centric view.
>>>>
>>>> It is also possible to define a system in terms of behavior
>>>> and only decide on a particular data representation once those
>>>> behaviors are designed. In practice, both approaches are
>>>> typically used.
>>>
>>> How can you possibly design a system in terms of the behavior of
>>> objects if you haven't first specified which objects are
>>> interesting?
>>
>> You focus on the behaviors of interest and partition those
>> behaviors into cohesive units of classes and modules.

>
> Behaviors of what?  Let's examine one behavior: barking.  When
> applied to a dog, I can see in my mind's eye a mailman reaching for
> his pepper spray, but when applied to a person it brings to mind
> those nice white men in their nice white coats.  So again I ask, how
> can you possibly design a system in terms of the behavior of objects
> if you haven't first specified which objects are interesting?

     One certainly usually starts with some candidate classes, but as
the interactions are identified it is not uncommon for the behaviors to migrate and the names of those candidate classes to change to better reflect their nature.

     In any case, the focus is on behavior, not data.

>>> Whenever there is a change in potential information content, that
>>> change may involve potential information that an application can
>>> access or manipulate, or potential information that an application
>>> doesn't access.
>>
>> Fallacy of the excluded middle. The change may also be in how
>> the information is modeled by either the application implementation
>> or the specific schema being used by the application. One option
>> is to use views to isolate the two. Another option is to decouple
>> the two components (application implementation and specific schema)
>> so that changes in one do not impact the other.
>
> And sometimes my dog has fleas.

     Another non sequitur.

> I don't think we're on the same page as to what constitutes a model, > either.

     Have you never developed an application that used data structures other than relations?

>>>> Not all data used by an application needs to be in the
>>>> database.
>>>
>>> I thought that we were discussing information that is to be and
>>> can be recorded. Such information needs to be in the database.
>>
>> That depends on how long it needs to remain available and if it
>> needs to be accessed by other clients of the database. I often
>> work on systems where a considerable portion of the information is
>> stored in a distributed shared object repository, in memory. You
>> could consider that a form of database, but it doesn't use a
>> relational model.
>
> Good luck if you have a hardware or power failure!

     Each node of the distributed repository is backed up to one or more other machines, synchronously. The whole cluster (actually, the minimal set of information required to recreate it) is backed up over a WAN to another data center for disaster recovery, typically asynchronoously. UPSs supply enough time to guarantee successful failover. And yes, there are occasional writes to a relational database, although those are kept out of the critical path of the business transactions.

     This architecture can guarantee whatever level of reliability is required by the system.

Sincerely,

Patrick



S P Engineering, Inc. | Large scale, mission-critical, distributed OO
                       | systems design and implementation.
          pjm_at_spe.com  | (C++, Java, Common Lisp, Jini, middleware, SOA)
Received on Tue Mar 25 2008 - 00:55:20 CET

Original text of this message