Re: The wisdom of the object mentors (Was: Searching OO Associations with RDBMS Persistence Models)

From: Bruno Desthuilliers <bdesth.quelquechose_at_free.quelquepart.fr>
Date: Sun, 04 Jun 2006 23:29:16 +0200
Message-ID: <44832111$0$10991$626a54ce_at_news.free.fr>


Marshall a écrit :
> Bruno Desthuilliers wrote:
>

>>erk a écrit :
>>
>>>Robert Martin wrote:
>>>
>>>
>>>>It's called decoupling.  (snip)   The
>>>>idea is that you write the application program in such a way that it
>>>>can manipulate the data in the data model without coupling it directly
>>>>to the DBMS, or the details of the schema.
>>>
>>>This makes no sense to me - how do you manipulate data without knowing
>>>the details of the schema?
>>
>>It's obvious that at least one part of the application must know that
>>details ('details' as in 'detailed', not as in 'non important'). The
>>idea is to encapsulate this knowledge in one specific part of the
>>application instead of letting it creeps throughout the whole
>>application. And the rational is that it should isolate the application
>>from changes in the RDBMS (schema, vendor, whatever).

>
>
> This doesn't answer the question, though:

Yes it does.

> how do you manipulate
> data without knowing the details of the schema?

It was in the very first words of my post: "
 >>It's obvious that at least one part of the application must know that  >>details ('details' as in 'detailed', not as in 'non important'). "

> For an application
> to contribute usefully to the manipulation of some data, it has
> to know how that data is structured.

Of course. But this doesn't mean *all* the application code needs to know how the data is structured *in the RDBMS* schema. *Some part(s)* of the application has to know about the DB schema. Some other parts are happier with an application (and language) specific representation of this data structure. The representation of data as exposed by the RDBMS schema (or XML or LDAP or whatever) is rarely (in my experience) the best representation for the rest of the application - unless of course the application's duty is to directly work on these schemas (SQLAdmin GUI's, XML editors etc).

Also, having all the detailed knowledge about the RDBMS schema in one place (or a very few places) is very handy when one have to accomodate minor changes in this schema - it avoids having to browse and modify all the application's code. It's of course not enough to protect you from major changes, but then it's a totally different problem IMHO - major changes in the schema usually means major changes in specs, business rules and whatnots, so there's some redesign/rewrite to do anyway. And even then, the cleanest the design (read : hi cohesion, low coupling), the easiest the rewrite. It seems obvious (to me at least) that the less code you have to change, the less risks you have to introduce bugs !-)

> As a simple example, if you
> have a data structure for customers, and a data structure for
> polygons,

 From the application's code POV, the data structure representing a customer may be somewhat different from the RDBMS schema definitions relating to the customer - same data, different representation. As a matter of fact, a good part of application programming is about "translating" one data structure into another one, and it's sometimes easiest (or interesting for any other reasons) to use intermediate representations. I've rarely seen the case where a tuple as returned from a SQL query happened to be the exact desired structure for all the parts of the application using it. And even in this case, I wouldn't want to have to retype this exact SQL query in each part of the code where I need to get this representation of the data. I prefer to have the code in charge of issuing this query and returning the result in the desired representation (whatever translation this implies, including no translation at all) in only one place.

> you cannot have code that doesn't know which one
> it is working on.

I fail to see any real-world use-case where this (handling polygons and customers in the same part of application code) would be needed, but FWIW, it mostly depends on how you represent customers and polygons, and what you want to do with them. Now if customer objects and polygon objects both understand a same message, then yes, you can send this message to either polygon objects and customer objects without even having to care what they are.

Of course, the code which will be invoked in reaction to this message have very few chances to be the same for both kind of objects - but this is what encapsulation and polymorphism are about. Encapsulate what's different, expose what's common, so client code doesn't need to have too much specific knowledge. Having a common interface for different objects let you handle these objects in a uniform way.

> The code cannot be abstracted away from
> the question of whether it is an order entry system or a
> geometry engine.

True, for most of the application's own code[*]. Did I ever claimed otherwise ? Now I still fail to see why would imply that all the application code should have detailed knowledge of the RDBMS schema.

[*] Note, FWIW, that there are huge parts of the code contributing to the application that can be - and actually is - abstracted from this kind of question. The code of the RDBMS just don't care if you use it for one or the other - he only deals schemas, tables, fields, indices, constraints etc. The code of the GUI Toolkit (if it's a GUI application) only deals with windows, widgets etc.

>>Someone here claimed that (RDBMS-based) applications should be nothing
>>more than user-friendly substitutes to the RDBMS console. This is of
>>course mostly false [1]. But even in this rare cases of a DB-UI
>>pipeline, there's this "user-friendly" requirement... 

>
> What, you can't build a user-friendly view if the model is
> an RDBMS, but you can if it's objects?

The model is not "an RDBMS". The RDBMS schema is one representation of the model. Which is totally different.

And I didn't claimed I needed objects to build a user-friendly view of the model - I've done it with procedural languages too, it just happens that I have a preference for OO (and somehow FP) over procedural.

What I say is that just pipelining from RDBMS to UI and back is not enough - for the final users at least. Strange as it migh be, final users of my apps usually don't give a damn about the RDBMS schema - nor about how I designed and coded my app FWIW. They just want a working application. What do you think would happen if I was to deliver my customers a fresh PgAdmin install as being the application ? Yes, they would just laugh, ask me to give the money back, and go and find someone else to do the job. And they would be right (unless of course all they asked me was a web application to manage a PostgreSQL db).

>>>I understand the value of a mapping layer when one is forced to
>>>interface with applications that have different expectations of data
>>>format. But what you write above seems to imply that an order entry
>>>system (e.g.) should have no knowledge of, for example, the fact that
>>>an order has line items.
>>
>>I definitiveley don't understand it that way. But the detailed knowledge
>>of how this data model lives in the RDBMS - and even the existence of
>>the RDBMS - should certainly not be known to the GUI (or web or
>>whatever) forms. As you say, an order entry system surely has to know
>>that an order has line items - but knowing that there's an 'orders'
>>table and 'lineitem' table with a foreign key on the 'orders' table
>>should only concern a very restricted part of the application.

>
>
> Let's rephrase that slightly:
>
> "an order entry system surely has to know that an order has line items"
>
> An order entry system has to know that there exists a set of orders,

Yes. Of course.

> and that each order has a set of line items ....

May help too !-)

> "but knowing that there's an 'orders' table and 'lineitem' table with
> a
> foreign key on the 'orders' table should only concern a very restricted
> part of the application."
>
> but knowing that there's a set of orders, and that each order has a set
> of line items should only concern a very restricted part of the
> application.

I deliberately choosed the words "table" and "foreign key". I'm not debating mathematical theory here, but talking about very practical application design and implementation concerns. RDBMS are an *implementation* of the relational theory (and AFAIK not really a perfect one). When working on a RDBMS-based application, I'm dealing with a RDBMS, not doing mathematics. And AFAIK, 'table' and 'foreign key' are part of the SQL standard.

> I cannot make sense of this. The app must know that there exists
> a set of orders, but knowledge of the orders table, which is nothing
> more than a set of orders,

An 'orders' table in a RDBMS is not 'a set of orders' - it's how the set of orders is represented by the RDBMS. Which is totally different (from an implementation POV). How this set of orders is represented by the RDBMS is definitively not something the UI code has to know IMHO (which itself comes from experience). The UI code doesn't have to know anything about the RDBMS - even the fact that there's a RDBMS is none of it's concerns. The UI code's responsability is to display the user a representation of the data (including what's needed to take action on these data) that matches the user's view of the data for a given job. Anything else is irrelevant *in this part of the code*.

> must be restricted. Why?

The answer is in the post you're answering to. Please re-read it !-)

And while you're at it, notice that I don't make any claim about knowing how to solve all the world's hunger problems and cure all diseases. I just encountered some problems with applications not applying this kind of restriction, told myself "there must be a better way", told myself "I'm certainly not the first one thinking there's a better way", and tried to see how other peoples solved this. FWIW, I haven't yet found any working miraculous solution allowing one to have it's cake and eat it too (while I found a lot of writings about solutions claiming so,...).

But I've found - from both readings and experience - that partitionning an application's code into modules with high cohesion and low-coupling led to definitively more robust and usually somewhat more versatile code (of course, going from an order entry system to a geometry engine would be hard to support without a huge rewrite !). Nothing new here as you can see - but then again, I never claimed having discovered anything new.

BTW, no need to try and convince me of the benefits of the relational theory - I'm already sold[*] !-) But there's a gap between the relational theory, SQL databases, and application programming languages. Now I'm still dreaming of a *sane* bridge between these parts - but I certainly don't have the required knowledge to solve this.

[*] which doesn't means I think all and every possible application needs a RDBMS !-)

My 2 (and a half) cents...

NB : Ho, yes, one last point : as you can guess from my name (and various syntaxic, grammatical or spelling errors), I'm not a native english speaker. So please keep in mind that what I write here may not always accurately express what I'm really trying to say. Received on Sun Jun 04 2006 - 23:29:16 CEST

Original text of this message