Re: OO versus RDB

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Wed, 28 Jun 2006 22:51:20 +0200
Message-ID: <44a2eb4c$0$31652$e4fe514c_at_news.xs4all.nl>


H. S. Lahman wrote:
> Responding to mAsterdam...
>

>>>> You snipped:
>>>>
>>>>>> ISTM persistence is no issue. A few weeks
>>>>>> ago I asked cdt & co for a relevant definition.
>>>>>> Maybe you have one. 

[snip acronym misunderstanding]

> I'll go with the one from the Computer Desktop Encyclopedia: persistent
> data is data that exists from session to session. (Where 'session' is
> the period between starting and stopping an application.) That's a bit
> simplistic for 24x7 applications, but the basic notion that the data has
> relatively permanent existence outside the application applies.

That is, persistence is something outside the realm where objects behave. No behaviour, no objects. In this definition there is persistent data, there are no persistent objects.

>>>>> ... No, it is about solving problems (OO) vs. 
>>>>> persisting data (data management).
>>>>
>>>> What do you mean with "solving problems (OO)" -
>>>> are they synonyms to you?
>>>
>>> No.  OO is one form of problem solving just as a  RDB-based DBMS is 
>>> one form of data management.
>>
>> "form of"? "problem solving" (in general instead of
>> some category of problems) - I all sounds huge, but
>> I fail to see what you mean by it.
>> I'll try something I could understand using similar words:
>>
>> OO is one set of solutions for one problem,
>> how to organize code: 'Sesame, open!' instead of 'Open Sesame!'.

>
> I prefer: OO is a paradigm for solving problems on a computer.

Big words ('paradigm', 'solving problems') only confuse the issues.

What problem does one attempt to solve with OO but the organization of code?

> However,
> my objection above was that it is not synonymous with problem solving
> because problems can be solved on a computer with different paradigms
> than OO.

>
>> Data management is done by people.
>> A DBMS is part of the toolkit.

>
> Fine. Just as problems are solved by people while editors, compilers,
> code generators, etc. are tools.
>
> The point is that data management and problem solving are quite
> different concerns and activities.

?? Data management does not solve problems? Are you sure?

>>>>>>>>> So one should hide the SQL as well.
>>>>>>>>
>>>>>>>> As soon as we have a better embeddable
>>>>>>>> abstraction, yes. Please provide an url.
>>
>> Did you find one?

>
> I didn't look because it is irrelevant to the discussion. The issue is
> not about finding an alternative to SQL for accessing an RDB. (Though
> clearly if one is using flat files it is not the best choice.)
> It is about hiding /whatever/

... behind something else. To be of any beneficial value this something else must provide a better abstraction than whatever it is one is hiding.

> access mechanism one uses.

Usage of SQL can hide the storage access from your application , a DBMS hides (and solves) a lot of the data-sharing problems.
You propose to hide the data itself, overshooting
the purpose of abstraction.

>>>>>>> ...Or you change RDB vendors?
>>>>>>
>>>>>> Then you have to change the vendor specifics
>>>>>> in the code and migrate the data.
>>>>>
>>>>> The point is that the application problem solution does not care 
>>>>> how the data is stored. Nor should it be affected by any changes to 
>>>>> the data storage that do not effect its semantics. 
>>>>
>>>> We are in agreement here.
>>>>
>>>> Were we differ is:
>>>> You (appear to) assume that dbms and SQL are about storage.
>>>> I don't.
>>>
>>> I believe they are about managing stored data.
>>
>> and, off-hand,
>>     - the sharing of those data,
>>     - providing suitable abstractions for
>>       programs to manipulate those data.

>
> Sharing certainly; providing problem-independent access
> to a data store is the primary purpose of a DBMS.

Sure, the DBMS is fact-agnostic.
The schema however is a model of the facts, relevant to the business - (the business as including the problems).

While it may be worthwhile to invest in hiding DBMS-specifics, hiding the (relevant part of the) schema reduces your application to a generic CRUD/USER "solution" (I looked up USER at your site).

> However, outside CRUD/USER processing the solution to a particular
> problem will almost always have a different representation of the data
> than the DBMS so that it can manipulate the data optimally for the
> solution in hand.
>

>>>>>>> As such, it should be hidden from the rest of the application  
>>>>>>> that doesn't care what flavor of persistence is used.  
>>>>>>
>>>>>> In the case of using SQL as interface to users data:
>>>>>> make sure that only that part of our code which
>>>>>> processes some specific data mentions only that
>>>>>> part of the schema which is relevant to it.
>>>>>
>>>>> Better yet, encapsulate it so that none of the
>>>>> problem solution code mentions it.
>>>>
>>>> Hiding irrelevant stuff seems sensible.
>>>> Not mentioning the relevant stuff seems a bit weird.
>>>>
>>>> Metaphorically:
>>>> I can imagine some transport layer being
>>>> ignorant of the content of the luggage.
>>>
>>> Conversely, some baggage content layer could be
>>> ignorant of transport mechanisms.
>>
>> Sure. But how is this relevant to hiding
>> the *relevant* stuff?

>
> The way persistent data is managed and accessed does to matter to the
> logic of a problem solution that manipulates the data after it is
> accessed. The persistence mechanisms should be completely transparent
> to the problem solution in the same sense that a particular problem
> solution should not matter to the way data is managed on the enterprise
> level.

Do you or don't you try to hide the relevant part of the schema from your application?

>>>>>> I don't see how an extra layer dealing
>>>>>> with (especially when specific data) could help
>>>>>> - ISTM it only blurs the separation of concerns.
>>>>>
>>>>> Encapsulation and isolation. 
>>>>> When the schemas and/or access 

There is an important difference between schema and access. Do you agree?

>>>>> paradigms change one does not have to touch the problem solution in 
>>>>> any way.  So one can be confident that the problem solution still 
>>>>> works.  All one needs to validate is that the layer or subsystem 
>>>>> interface still provides the same data in response to requests from 
>>>>> the problem solution.
>>>>
>>>> A schema change breaks a query or it doesn't.
>>>> If it doesn't all is well.
>>>> If it does you'll have to investigate the code
>>>> dependent on the query.
>>>
>>> It breaks the query, but not the problem solution that needs the data. 
>>
>> Is your "problem solution" synonymous to "the code"?
>> If not what do you mean.

>
> In a non-CRUD/USER context I have a complex problem to solve.

That is, in any serious business context. Agreed.

> To do
> that I have to manipulate data in data structures tailored to my problem
> solution. Those data structures are <usually> initialized by data
> acquired from a persistent data store.

So, you do some rendering, sometimes quite sophisticated rendering.

> But the access of the data to do
> that initialization is quite independent of the problem solution.

You keep saying that as if I don't understand. Now, back to the question:
 >> Is your "problem solution" synonymous to "the code"?  >> If not what do you mean.

> IOW, both the problem solution and the data access are "the code". They
> are separated by logical modularization and decoupling to make the
> application easier to implement and maintain.
>
> I really don't know why this notion of
> separation of concerns is such a novelty.

Your assumption is wrong.

> Modularization has been a basic part of large scale software
> development since the '60s and there is nothing particularly OO about it.

Yep. Now, back to the question:
 >> Is your "problem solution" synonymous to "the code"?  >> If not what do you mean.

>>> The change is isolated to modifying the query.  If that query 
>>> construction is isolated from the problem solution then the problem 
>>> solution is unaffected.  If the query construction is embedded in the 
>>> problem solution, then there is always some chance the solution will 
>>> be broken.
>>
>> If a "problem solution" (scare quotes indicating I am not
>> sure what you mean by that) references more of the schema
>> than strictly what it needs, more possible breakage has to
>> be investigated during impact analysis of a schema-change.

>
> That's not the issue. Any time one makes /any/ change to an application
> there is a potential to insert a defect. One reason one separates
> concerns is so that the insertion defects is isolated and limited in
> what can be broken.

As far as possible, but no more.

> If you don't touch the problem solution code nor the interface it uses
> to access the data it needs, then you can be confident that you didn't
> break the solution logic. Then all you have to demonstrate is that the
> persistence access subsystem still provides the same data values it did
> before the change to it.
>
> Again, this sort of modularization, decoupling, test management, and
> defect prevention is really basic software development stuff once one is
> outside the realm of CRUD/USER pipeline applications.

And even inside CRUD "applications" modularization is done. This is not the issue.

In other words: It is not about whether to cut, but about where and how to cut, how to repair the damaging effect of the cut, and to what level cutting is beneficial.

>>>>>>>> ...SQL DBMS mostly use files in file systems for storage.
>>>>>>>> Why place SQL DBMS in between if you are just looking for storage?
>>>>>>>
>>>>>>> Because outside the realm of CRUD/USER the problem solution 
>>>>>>> should not depend upon the persistence mechanisms.
>>>>>>
>>>>>> No - I'll rephrase: why not use storage systems for storage?
>>>>>> Why use SQL as a go-between at all?
>>>>>
>>>>> I don't care what storage paradigm is used or what access 
>>>>> mechanisms one uses to access the data store.  The point is that, 
>>>>> whatever they are, they should be isolated from the problem 
>>>>> solution so that they are completely transparent to the problem 
>>>>> solution.
>>>>
>>>> I don't use paradigms for storage.
>>>> I don't use a dbms for storage.
>>>
>>> Codd's relational data model as implemented in RDBs is not a data 
>>> storage paradigm?
>>
>> Indeed it is not.

>
> Wow. I give up. This disagreement is so profound
> I don't even know how to begin to respond.

What is there to disagree on?

Assume Bill wants storage.
Say he goes to the shop and buys a DBMS. Now Bill still needs to buy storage!?

What did he buy the DBMS for?

-- 
"The person who says it cannot be done
should not interrupt the person doing it."
Chinese Proverb.
Received on Wed Jun 28 2006 - 22:51:20 CEST

Original text of this message