Re: OO versus RDB

From: H. S. Lahman <h.lahman_at_verizon.net>
Date: Thu, 29 Jun 2006 17:35:06 GMT
Message-ID: <ebUog.12687$Tk.8048_at_trnddc08>


Responding to mAsterdam...

>> I'll go with the one from the Computer Desktop Encyclopedia: 
>> persistent data is data that exists from session to session.  (Where 
>> 'session' is the period between starting and stopping an 
>> application.)  That's a bit simplistic for 24x7 applications, but the 
>> basic notion that the data has relatively permanent existence outside 
>> the application applies.

>
>
> That is, persistence is something outside the
> realm where objects behave. No behaviour, no objects.
> In this definition there is persistent data, there
> are no persistent objects.

Yes.

>>>>>> ... No, it is about solving problems (OO) vs. persisting data 
>>>>>> (data management).
>>>>>
>>>>>
>>>>> What do you mean with "solving problems (OO)" -
>>>>> are they synonyms to you?
>>>>
>>>>
>>>> No.  OO is one form of problem solving just as a  RDB-based DBMS is 
>>>> one form of data management.
>>>
>>>
>>> "form of"? "problem solving" (in general instead of
>>> some category of problems) - I all sounds huge, but
>>> I fail to see what you mean by it.
>>> I'll try something I could understand using similar words:
>>>
>>> OO is one set of solutions for one problem,
>>> how to organize code: 'Sesame, open!' instead of 'Open Sesame!'.
>>
>>
>> I prefer: OO is a paradigm for solving problems on a computer.  

>
>
> Big words ('paradigm', 'solving problems') only
> confuse the issues.

They have well-established definitions (e.g., a paradigm is a conceptual model) so they enhance communication.

>
> What problem does one attempt to solve with
> OO but the organization of code?

That's true, but so abstract it is irrelevant for this context. General Ledger. Inventory control. Printer device driver. ad infinitum. Every computer application solves a specific problem for some customer.

>> However, my objection above was that it is not synonymous with problem 
>> solving because problems can be solved on a computer with different 
>> paradigms than OO.
>>
>>> Data management is done by people.
>>> A DBMS is part of the toolkit.
>>
>>
>> Fine.  Just as problems are solved by people while editors, compilers, 
>> code generators, etc. are tools.
>>
>> The point is that data management and problem solving are quite 
>> different concerns and activities.

>
>
> ?? Data management does not solve problems? Are you sure?

It solves problems that are restricted to that realm, such as ensuring data integrity. In a broad sense it can include notions like data warehousing but in this context I see it as being about the services that a DBMS provides. It generally does not involve solving specific business problems.

>

>>>>>>>>>> So one should hide the SQL as well.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> As soon as we have a better embeddable
>>>>>>>>> abstraction, yes. Please provide an url.
>>>
>>>
>>> Did you find one?
>>
>>
>> I didn't look because it is irrelevant to the discussion.  The issue 
>> is not about finding an alternative to SQL for accessing an RDB.  
>> (Though clearly if one is using flat files it is not the best 
>> choice.)  It is about hiding /whatever/ 

>
>
> ... behind something else. To be of any beneficial
> value this something else must provide a better
> abstraction than whatever it is one is hiding.
>
>> access mechanism one uses.

>
>
> Usage of SQL can hide the storage access from your
> application , a DBMS hides (and solves) a lot of
> the data-sharing problems.
> You propose to hide the data itself, overshooting
> the purpose of abstraction.

Only if the data store is an RDB. SQL has no value for non-RDB data stores. The issue here is that the application shouldn't care what the data store mechanism is.

>>>>>>>> ...Or you change RDB vendors?
>>>>>>>
>>>>>>>
>>>>>>> Then you have to change the vendor specifics
>>>>>>> in the code and migrate the data.
>>>>>>
>>>>>>
>>>>>> The point is that the application problem solution does not care 
>>>>>> how the data is stored. Nor should it be affected by any changes 
>>>>>> to the data storage that do not effect its semantics. 
>>>>>
>>>>>
>>>>> We are in agreement here.
>>>>>
>>>>> Were we differ is:
>>>>> You (appear to) assume that dbms and SQL are about storage.
>>>>> I don't.
>>>>
>>>>
>>>> I believe they are about managing stored data.
>>>
>>>
>>> and, off-hand,
>>>     - the sharing of those data,
>>>     - providing suitable abstractions for
>>>       programs to manipulate those data.
>>
>>
>> Sharing certainly; providing problem-independent access  to a data 
>> store is the primary purpose of a DBMS.

>
>
> Sure, the DBMS is fact-agnostic.
> The schema however is a model of the facts, relevant
> to the business - (the business as including the problems).
>
> While it may be worthwhile to invest in hiding
> DBMS-specifics, hiding the (relevant part of the)
> schema reduces your application to a generic
> CRUD/USER "solution" (I looked up USER at your site).

Sure, the schema provides a structural model. But it is a static model.   Applications require dynamic models.

The CRUD and USER acronyms describe the nature of the processing. CRUD/USER applications are essentially pipeline applications that convert between the DBMS and UI views. IOW, the main problem being solved in software is presenting existing data in convenient ways that allow the software user to solve some other problem by analyzing the data.

The kinds of applications I am talking about are ones that solve some problem by manipulating data and then present the /results/ to the software user.

>>>>>>>> As such, it should be hidden from the rest of the application  
>>>>>>>> that doesn't care what flavor of persistence is used.  
>>>>>>>
>>>>>>>
>>>>>>> In the case of using SQL as interface to users data:
>>>>>>> make sure that only that part of our code which
>>>>>>> processes some specific data mentions only that
>>>>>>> part of the schema which is relevant to it.
>>>>>>
>>>>>>
>>>>>> Better yet, encapsulate it so that none of the
>>>>>> problem solution code mentions it.
>>>>>
>>>>>
>>>>> Hiding irrelevant stuff seems sensible.
>>>>> Not mentioning the relevant stuff seems a bit weird.
>>>>>
>>>>> Metaphorically:
>>>>> I can imagine some transport layer being
>>>>> ignorant of the content of the luggage.
>>>>
>>>>
>>>> Conversely, some baggage content layer could be
>>>> ignorant of transport mechanisms.
>>>
>>>
>>> Sure. But how is this relevant to hiding
>>> the *relevant* stuff?
>>
>>
>> The way persistent data is managed and accessed does to matter to the 
>> logic of a problem solution that manipulates the data after it is 
>> accessed.  The persistence mechanisms should be completely transparent 
>> to the problem solution in the same sense that a particular problem 
>> solution should not matter to the way data is managed on the 
>> enterprise level.

>
>
> Do you or don't you try to hide the relevant
> part of the schema from your application?

Sorry. Typo. "... does not matter to ..." in the first sentence.

>>>>>>> I don't see how an extra layer dealing
>>>>>>> with (especially when specific data) could help
>>>>>>> - ISTM it only blurs the separation of concerns.
>>>>>>
>>>>>>
>>>>>> Encapsulation and isolation. When the schemas and/or access 

>
>
> There is an important difference between
> schema and access. Do you agree?

Yes.

>>>>>> paradigms change one does not have to touch the problem solution 
>>>>>> in any way.  So one can be confident that the problem solution 
>>>>>> still works.  All one needs to validate is that the layer or 
>>>>>> subsystem interface still provides the same data in response to 
>>>>>> requests from the problem solution.
>>>>>
>>>>>
>>>>> A schema change breaks a query or it doesn't.
>>>>> If it doesn't all is well.
>>>>> If it does you'll have to investigate the code
>>>>> dependent on the query.
>>>>
>>>>
>>>> It breaks the query, but not the problem solution that needs the data. 
>>>
>>>
>>> Is your "problem solution" synonymous to "the code"?
>>> If not what do you mean.
>>
>>
>> In a non-CRUD/USER context I have a complex problem to solve. 

>
>
> That is, in any serious business context. Agreed.
>
>> To do that I have to manipulate data in data structures tailored to my 
>> problem solution.  Those data structures are <usually> initialized by 
>> data acquired from a persistent data store.

>
>
> So, you do some rendering, sometimes quite sophisticated rendering.
>
>> But the access of the data to do that initialization is quite 
>> independent of the problem solution.

>
>
> You keep saying that as if I don't understand.
> Now, back to the question:
> >> Is your "problem solution" synonymous to "the code"?
> >> If not what do you mean.
>
>> IOW, both the problem solution and the data access are "the code".  
>> They are separated by logical modularization and decoupling to make 
>> the application easier to implement and maintain.
>>
>> I really don't know why this notion of separation of concerns is such 
>> a novelty.  

>
>
> Your assumption is wrong.

What assumption?

>

>> Modularization has been a basic part of large scale software 
>> development since the '60s and there is nothing particularly OO about it.

>
>
> Yep. Now, back to the question:
> >> Is your "problem solution" synonymous to "the code"?
> >> If not what do you mean.

You keep pushing this and the answer is still the same. Both the solution and access mechanisms are encoded in application code. But the application code for each is isolated, encapsulated, and logically decoupled from the other. That's what subsystems are for.

>

>>>> The change is isolated to modifying the query.  If that query 
>>>> construction is isolated from the problem solution then the problem 
>>>> solution is unaffected.  If the query construction is embedded in 
>>>> the problem solution, then there is always some chance the solution 
>>>> will be broken.
>>>
>>>
>>> If a "problem solution" (scare quotes indicating I am not
>>> sure what you mean by that) references more of the schema
>>> than strictly what it needs, more possible breakage has to
>>> be investigated during impact analysis of a schema-change.
>>
>>
>> That's not the issue.  Any time one makes /any/ change to an 
>> application there is a potential to insert a defect.  One reason one 
>> separates concerns is so that the insertion defects is isolated and 
>> limited in what can be broken.

>
>
> As far as possible, but no more.

I'm not sure what this means. There are a variety of reasons to separate concerns, of which defect prevention is one.

>

>> If you don't touch the problem solution code nor the interface it uses 
>> to access the data it needs, then you can be confident that you didn't 
>> break the solution logic.  Then all you have to demonstrate is that 
>> the persistence access subsystem still provides the same data values 
>> it did before the change to it.
>>
>> Again, this sort of modularization, decoupling, test management, and 
>> defect prevention is really basic software development stuff once one 
>> is outside the realm of CRUD/USER pipeline applications.

>
>
> And even inside CRUD "applications" modularization is done.
> This is not the issue.
>
> In other words: It is not about whether to cut, but about where
> and how to cut, how to repair the damaging effect of the cut,
> and to what level cutting is beneficial.

Yes.

>>>>>>>>> ...SQL DBMS mostly use files in file systems for storage.
>>>>>>>>> Why place SQL DBMS in between if you are just looking for storage?
>>>>>>>>
>>>>>>>>
>>>>>>>> Because outside the realm of CRUD/USER the problem solution 
>>>>>>>> should not depend upon the persistence mechanisms.
>>>>>>>
>>>>>>>
>>>>>>> No - I'll rephrase: why not use storage systems for storage?
>>>>>>> Why use SQL as a go-between at all?
>>>>>>
>>>>>>
>>>>>> I don't care what storage paradigm is used or what access 
>>>>>> mechanisms one uses to access the data store.  The point is that, 
>>>>>> whatever they are, they should be isolated from the problem 
>>>>>> solution so that they are completely transparent to the problem 
>>>>>> solution.
>>>>>
>>>>>
>>>>> I don't use paradigms for storage.
>>>>> I don't use a dbms for storage.
>>>>
>>>>
>>>> Codd's relational data model as implemented in RDBs is not a data 
>>>> storage paradigm?
>>>
>>>
>>> Indeed it is not.
>>
>>
>> Wow.  I give up.  This disagreement is so profound I don't even know 
>> how to begin to respond.

>
>
> What is there to disagree on?
>
> Assume Bill wants storage.
> Say he goes to the shop and buys a DBMS.
> Now Bill still needs to buy storage!?
> What did he buy the DBMS for?

I don't understand this. You seem to be equating data storage to hardware. The hardware platform is not relevant.

The RDM as applied in RDBs is a conceptual model for storing and accessing data that doesn't depend on the physical platform. That conceptual model is clearly different than the models used for OODBs, CODASYL DBs, etc. even though the physical platforms may be the same. IOW, it is a unique paradigm for data storage.



There is nothing wrong with me that could not be cured by a capful of Drano.

H. S. Lahman
hsl_at_pathfindermda.com
Pathfinder Solutions -- Put MDA to Work http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php. (888)OOA-PATH Received on Thu Jun 29 2006 - 19:35:06 CEST

Original text of this message