Re: Mixing OO and DB
Date: Tue, 25 Mar 2008 03:23:58 GMT
Message-ID: <if_Fj.30093$R84.5229_at_newssvr25.news.prodigy.net>
"Patrick May" <pjm_at_spe.com> wrote in message news:m2bq534qzb.fsf_at_spe.com...
> "Brian Selzer" <brian_at_selzer-software.com> writes:
>> "Patrick May" <pjm_at_spe.com> wrote in message
>> news:m2k5jvohbo.fsf_at_spe.com...
>>>>> The fact that views can be used actually demonstrates that >>>>> the application can be decoupled from the schema. You are >>>>> suggesting using views to do so. That's one possible mechanism. >>>>> OO languages provide others. >>>> >>>> You have a very narrow and limited view of what a schema is and >>>> what it can provide. >>> >>> You have a deft hand with non sequiturs. >>> >>
>> One of my failings is being able to think ahead several moves.
> > Yes, yes, and your biggest deficiency as an employee is that you > care too much and work too hard. I realize some of the posing on > c.d.t. might be infectious, but here in comp.object we can have grown > up conversations (our resident troll excepted, of course). >
Wow! Where did that come from? I did not intend to offend. In hindsight, I guess it could have been construed that way. People sometimes find me hard to follow because I don't always convey my entire thought process, often jumping ahead and leaving out critical details.
>> A schema specifies potential information content. By using
>> projections and joins, partitioning restrictions and disjoint
>> unions, that content can be presented as various sets of
>> relations--each having exactly the same potential for information
>> content. For example, a relation schema R{A, B, C} where {A} is the
>> key has the same potential for information content as a set of two
>> relation schemata, S{A, B} and T{A, C} constrained by a circular
>> inclusion dependency S[A] = T[A], where {A} is the key of both S and
>> T. What that means is that if you join S and T you get R, and if
>> you take projections over R[A, B] and R[A, C] you get S and T
>> respectively. So in this example, if you specify relation schema R,
>> relation schemata S and T and S[A] = T[A] can be inferred, or if you
>> specify S and T and S[A] = T[A], R can be inferred.
> > I don't disagree, but it's still a non sequitur. You claimed > that it isn't possible to decouple the application implementation from > the specific schema. That is clearly incorrect because the same > internal representation used by the application can be supported by > more than one specific database schema, as you describe here. If the > specific schema is encapsulated such that the application is decoupled > from it, you can change the specific schema without impact to the > application. >
Not exactly. I claimed that it isn't possible to decouple the application from the schema. I believe I said, and I think you agreed, that a schema specifies what is to be and can be recorded, and it is in that sense that it cannot be separated from the application.
>>>>> Even when views are used, the application should be decoupled >>>>> from the schema because the two models are often very different. >>>>> Applications can organize information in ways other than the >>>>> relational model. >>>> >>>> I just don't buy this. If the information is the same, but just >>>> organized differently, then there must exist a transposition >>>> between them. Each is then just a different possible >>>> representation of the same information. >>> >>> Some representations are more expressive in terms of the >>> problem or solution domain. Tuples are not always the optimal data >>> structure. >>
>> I'm not sure if we're on the same page as to what constitutes
>> expressiveness.
> > It comes down to the fact that some solutions are more easily > implemented using structures other than tuples. > >>>> If the transposition is done by the DBMS, then it can retain its >>>> responsibility for guaranteeing integrity. If the transposition >>>> is done by the application, then that responsibility may need to >>>> shift from the DBMS to the application--every application. Now >>>> you have to guarantee that the code that is used to access the >>>> information is identical in every application that uses the >>>> information >>> >>> If that is a requirement, it's a good argument for a shared >>> mapping layer or other decoupling mechanism. In fact, though, >>> different applications often need different representations of >>> different subsets of the data available in a relational database, >>> plus data that is only used within the application. Because the >>> application has a different, non-relational model of the data, >>> decoupling is good design. >>
>> And what model is that? Is OO a data model?
> > It could be anything from a simple stack to a DAG to a full > object graph. Internally, the application isn't often using tuples. >
Data structures are not data models.
>>>> --AND, you have to prevent ad-hoc access to the data. >>> >>> Why? It's certainly easier to maintain the integrity of the >>> database if you can, but many systems support multiple applications >>> and ad-hoc interaction with the underlying database. That's what >>> locking and other concurrency techniques are for. >>
>> If information is held in the memory of some application and is also
>> in the database, and if the copy in memory changes, then the copy in
>> the database is stale, and any query against the database must be
>> considered suspect.
> > True. This is a standard problem in large distributed systems. > There are many techniques for dealing with it that don't require a > centralized database. That's not to say that a centralized database > is never a good solution, it's just not always the best solution. >
So what does that have to do with allowing ad-hoc access?
>>> A relational database is a very generic technology. An >>> application is much more specific and can therefore take advantage >>> of less general types and data structures that improve the >>> performance and maintainability of the application code. Except >>> for CRUD systems, the database vendors can't address those problem >>> domains in a generic way. >>
>> Until another application needs to use the data. It's a common
>> problem among those who start out as programmers--to get focused on
>> the details and therefore fail to see the big picture. It's a hard
>> habit to break.
> > There's that snide c.d.t. attitude again. There are more ways > to build distributed systems than with relational databases, and those > of us who do it tend to manage the big picture just fine, thank you > very much.
> > When your snark is removed, I note that you have failed to > address the point I made. >
I believe my point clearly addresses the point you made.
>>>>>> If by data centric you mean that the information that is to be >>>>>> and can be recorded must be specified before even considering >>>>>> how that information may behave, then I agree: it is a data >>>>>> centric view. >>>>> >>>>> It is also possible to define a system in terms of behavior >>>>> and only decide on a particular data representation once those >>>>> behaviors are designed. In practice, both approaches are >>>>> typically used. >>>> >>>> How can you possibly design a system in terms of the behavior of >>>> objects if you haven't first specified which objects are >>>> interesting? >>> >>> You focus on the behaviors of interest and partition those >>> behaviors into cohesive units of classes and modules. >>
>> Behaviors of what? Let's examine one behavior: barking. When
>> applied to a dog, I can see in my mind's eye a mailman reaching for
>> his pepper spray, but when applied to a person it brings to mind
>> those nice white men in their nice white coats. So again I ask, how
>> can you possibly design a system in terms of the behavior of objects
>> if you haven't first specified which objects are interesting?
> > One certainly usually starts with some candidate classes, but as > the interactions are identified it is not uncommon for the behaviors > to migrate and the names of those candidate classes to change to > better reflect their nature. > > In any case, the focus is on behavior, not data. > >>>> Whenever there is a change in potential information content, that >>>> change may involve potential information that an application can >>>> access or manipulate, or potential information that an application >>>> doesn't access. >>> >>> Fallacy of the excluded middle. The change may also be in how >>> the information is modeled by either the application implementation >>> or the specific schema being used by the application. One option >>> is to use views to isolate the two. Another option is to decouple >>> the two components (application implementation and specific schema) >>> so that changes in one do not impact the other. >>
>> And sometimes my dog has fleas.
> > Another non sequitur. >
>> I don't think we're on the same page as to what constitutes a model,
>> either.
> > Have you never developed an application that used data structures > other than relations? >
Indeed I have. But data structures are not data models. Structure is only one component of a data model. An even more important component is a set of constraints that specifies what states and changes of state are possible. Much of the time the constraints actually determine the structure--for example, from a set of functional dependencies, a normalized relational database schema can be inferred.
>>>>> Not all data used by an application needs to be in the >>>>> database. >>>> >>>> I thought that we were discussing information that is to be and >>>> can be recorded. Such information needs to be in the database. >>> >>> That depends on how long it needs to remain available and if it >>> needs to be accessed by other clients of the database. I often >>> work on systems where a considerable portion of the information is >>> stored in a distributed shared object repository, in memory. You >>> could consider that a form of database, but it doesn't use a >>> relational model. >>
>> Good luck if you have a hardware or power failure!
> > Each node of the distributed repository is backed up to one or > more other machines, synchronously. The whole cluster (actually, the > minimal set of information required to recreate it) is backed up over > a WAN to another data center for disaster recovery, typically > asynchronoously. UPSs supply enough time to guarantee successful > failover. And yes, there are occasional writes to a relational > database, although those are kept out of the critical path of the > business transactions. > > This architecture can guarantee whatever level of reliability is > required by the system. > > Sincerely, > > Patrick > > ------------------------------------------------------------------------ > S P Engineering, Inc. | Large scale, mission-critical, distributed OO > | systems design and implementation. > pjm_at_spe.com | (C++, Java, Common Lisp, Jini, middleware, SOA)Received on Tue Mar 25 2008 - 04:23:58 CET