Re: OO versus RDB

From: H. S. Lahman <h.lahman_at_verizon.net>
Date: Wed, 28 Jun 2006 16:07:51 GMT
Message-ID: <rPxog.6621$D03.3031_at_trndny03>


Responding to mAsterdam...

>>> You snipped:
>>>
>>>>> ISTM persistence is no issue. A few weeks
>>>>> ago I asked cdt & co for a relevant definition.
>>>>> Maybe you have one. 
>>>
>>>
>>> Why?
>>
>>
>> I didn't know what the first sentence meant and I didn't know who "cdt 
>> & co" was.

>
>
> I guess you know by now:
> ISTM: It seems to me (YCLIU)
> cdt & co: the newsgroups comp.databases.theory and comp.object

Fascinating. I've been participating in online forums for three decades and never saw that acronym (that I recall). Nor YCLIU and CDT.

>
> But surely you /did/ understand I asked you a relevant
> definition for 'persistence', a term at the basis of your
> argumentation.

I'll go with the one from the Computer Desktop Encyclopedia: persistent data is data that exists from session to session. (Where 'session' is the period between starting and stopping an application.) That's a bit simplistic for 24x7 applications, but the basic notion that the data has relatively permanent existence outside the application applies.

>

>>> [snip]
>>>
>>>> No, it is about solving problems (OO) vs. persisting data  (data 
>>>> management).
>>>
>>>
>>> What do you mean with "solving problems (OO)" -
>>> are they synonyms to you?
>>
>>
>> No.  OO is one form of problem solving just as a  RDB-based DBMS is 
>> one form of data management.

>
>
> "form of"? "problem solving" (in general instead of
> some category of problems) - I all sounds huge, but
> I fail to see what you mean by it.
> I'll try something I could understand using similar words:
>
> OO is one set of solutions for one problem,
> how to organize code: 'Sesame, open!' instead of 'Open Sesame!'.

I prefer: OO is a paradigm for solving problems on a computer. However, my objection above was that it is not synonymous with problem solving because problems can be solved on a computer with different paradigms than OO.

>
> Data management is done by people.
> A DBMS is part of the toolkit.

Fine. Just as problems are solved by people while editors, compilers, code generators, etc. are tools.

The point is that data management and problem solving are quite different concerns and activities.

>

>>>>>>>> So one should hide the SQL as well.
>>>>>>>
>>>>>>>
>>>>>>> As soon as we have a better embeddable
>>>>>>> abstraction, yes. Please provide an url.

>
>
> Did you find one?

I didn't look because it is irrelevant to the discussion. The issue is not about finding an alternative to SQL for accessing an RDB. (Though clearly if one is using flat files it is not the best choice.) It is about hiding /whatever/ access mechanism one uses.

>>> [snip OODB & flatfile]
>>>
>>>>>> Or you change RDB vendors?
>>>>>
>>>>>
>>>>> Then you have to change the vendor specifics
>>>>> in the code and migrate the data.
>>>>
>>>>
>>>> The point is that the application problem solution does not care how 
>>>> the data is stored. Nor should it be affected by any changes to the 
>>>> data storage that do not effect its semantics. 
>>>
>>>
>>> We are in agreement here.
>>>
>>> Were we differ is:
>>> You (appear to) assume that dbms and SQL are about storage.
>>> I don't.
>>
>>
>> I believe they are about managing stored data.

>
>
> and, off-hand,
> - the sharing of those data,
> - providing suitable abstractions for
> programs to manipulate those data.

Sharing certainly; providing problem-independent access to a data store is the primary purpose of a DBMS.

However, outside CRUD/USER processing the solution to a particular problem will almost always have a different representation of the data than the DBMS so that it can manipulate the data optimally for the solution in hand.

>

>>>>>> As such, it should be hidden from the rest of the application  
>>>>>> that doesn't care what flavor of persistence is used.  
>>>>>
>>>>>
>>>>> In the case of using SQL as interface to users data:
>>>>> make sure that only that part of our code which
>>>>> processes some specific data mentions only that
>>>>> part of the schema which is relevant to it.
>>>>
>>>>
>>>> Better yet, encapsulate it so that none of the
>>>> problem solution code mentions it.
>>>
>>>
>>> Hiding irrelevant stuff seems sensible.
>>> Not mentioning the relevant stuff seems a bit weird.
>>>
>>> Metaphorically:
>>> I can imagine some transport layer being
>>> ignorant of the content of the luggage.
>>
>>
>> Conversely, some baggage content layer could be
>> ignorant of transport mechanisms.

>
>
> Sure. But how is this relevant to hiding
> the *relevant* stuff?

The way persistent data is managed and accessed does to matter to the logic of a problem solution that manipulates the data after it is accessed. The persistence mechanisms should be completely transparent to the problem solution in the same sense that a particular problem solution should not matter to the way data is managed on the enterprise level.

>

>>>>> I don't see how an extra layer dealing
>>>>> with (especially when specific data) could help
>>>>> - ISTM it only blurs the separation of concerns.
>>>>
>>>>
>>>> Encapsulation and isolation.  When the schemas and/or access 
>>>> paradigms change one does not have to touch the problem solution in 
>>>> any way.  So one can be confident that the problem solution still 
>>>> works.  All one needs to validate is that the layer or subsystem 
>>>> interface still provides the same data in response to requests from 
>>>> the problem solution.
>>>
>>>
>>> A schema change breaks a query or it doesn't.
>>> If it doesn't all is well.
>>> If it does you'll have to investigate the code
>>> dependent on the query.
>>
>>
>> It breaks the query, but not the problem solution that needs the data. 

>
>
> Is your "problem solution" synonymous to "the code"?
> If not what do you mean.

In a non-CRUD/USER context I have a complex problem to solve. To do that I have to manipulate data in data structures tailored to my problem solution. Those data structures are <usually> initialized by data acquired from a persistent data store. But the access of the data to do that initialization is quite independent of the problem solution.

IOW, both the problem solution and the data access are "the code". They are separated by logical modularization and decoupling to make the application easier to implement and maintain.

I really don't know why this notion of separation of concerns is such a novelty. Modularization has been a basic part of large scale software development since the '60s and there is nothing particularly OO about it.

>

>> The change is isolated to modifying the query.  If that query 
>> construction is isolated from the problem solution then the problem 
>> solution is unaffected.  If the query construction is embedded in the 
>> problem solution, then there is always some chance the solution will 
>> be broken.

>
>
> If a "problem solution" (scare quotes indicating I am not
> sure what you mean by that) references more of the schema
> than strictly what it needs, more possible breakage has to
> be investigated during impact analysis of a schema-change.

That's not the issue. Any time one makes /any/ change to an application there is a potential to insert a defect. One reason one separates concerns is so that the insertion defects is isolated and limited in what can be broken.

If you don't touch the problem solution code nor the interface it uses to access the data it needs, then you can be confident that you didn't break the solution logic. Then all you have to demonstrate is that the persistence access subsystem still provides the same data values it did before the change to it.

Again, this sort of modularization, decoupling, test management, and defect prevention is really basic software development stuff once one is outside the realm of CRUD/USER pipeline applications.

>

>>> [snip]
>>>
>>>>>>> SQL DBMS mostly use files in file systems for storage.
>>>>>>> Why place SQL DBMS in between if you are just looking for storage?
>>>>>>
>>>>>>
>>>>>> Because outside the realm of CRUD/USER the problem solution should 
>>>>>> not depend upon the persistence mechanisms.
>>>>>
>>>>>
>>>>> No - I'll rephrase: why not use storage systems for storage?
>>>>> Why use SQL as a go-between at all?
>>>>
>>>>
>>>> I don't care what storage paradigm is used or what access mechanisms 
>>>> one uses to access the data store.  The point is that, whatever they 
>>>> are, they should be isolated from the problem solution so that they 
>>>> are completely transparent to the problem solution.
>>>
>>>
>>> I don't use paradigms for storage.
>>> I don't use a dbms for storage.
>>
>>
>> Codd's relational data model as implemented in RDBs is not a data 
>> storage paradigm?

>
>
> Indeed it is not.

Wow. I give up. This disagreement is so profound I don't even know how to begin to respond.



There is nothing wrong with me that could not be cured by a capful of Drano.

H. S. Lahman
hsl_at_pathfindermda.com
Pathfinder Solutions -- Put MDA to Work http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php. (888)OOA-PATH Received on Wed Jun 28 2006 - 18:07:51 CEST

Original text of this message