Re: OO versus RDB

From: erk <>
Date: 28 Jun 2006 11:11:41 -0700
Message-ID: <>

H. S. Lahman wrote:
> The point is that data management and problem solving are quite
> different concerns and activities.

I don't think either one of these phrases is useful. "Problem solving" is part of it, but a rather small part, in my opinion. Troubleshooting existing systems might fall into this category. Analyzing requirements is a big part of our work, involving logic and notations to help identify gaps and inconsistencies. System design also involves logic and notations, of different types; specifications tie it to the requirements. And in coding, I'm doing more of the same: creating formal structures which ultimately meet the requirements. Some derivation, some creation, but problem solving? Not really, at least not for me and developers I've ever worked with.

"Data management" requires data, which have specifications (expressible as types and constraints). Those specifications are relied on by multiple systems sharing the database. But other than that, the term "data management" is extremely fuzzy.

Perhaps part of my problem is the absolute vacuousness of the word "management."

> I didn't look because it is irrelevant to the discussion. The issue is
> not about finding an alternative to SQL for accessing an RDB. (Though
> clearly if one is using flat files it is not the best choice.) It is
> about hiding /whatever/ access mechanism one uses.

How do you hide what type of "material" that mechanism is delivering to, and accepting from, its clients?

> However, outside CRUD/USER processing the solution to a particular
> problem will almost always have a different representation of the data
> than the DBMS so that it can manipulate the data optimally for the
> solution in hand.

I've never heard anyone discuss "problems" so much. Do you mean applications? Typically you don't have a huge number of disconnected "problems" accessing the same data. You have applications which have been enhanced based on business needs accessing the data.

Whether these "problems" have different "representations of the data" is irrelevant to whether they should have, or need, different representations; but even more critical is that the canonical form be reliably specified, and offers flexible operators. Given these specifications and operators, why should we water them down within applications?

> The persistence mechanisms should be completely transparent
> to the problem solution in the same sense that a particular problem
> solution should not matter to the way data is managed on the enterprise
> level.

Like a duplicate tuple, seeing this statement again and again doesn't make it any more true.

> In a non-CRUD/USER context I have a complex problem to solve. To do
> that I have to manipulate data in data structures tailored to my problem
> solution.

Wrong. Why do the data structures have to be tailored? And by "tailored," does that necessarily mean custom-crafted with a "mapping layer" between it and the database?

> Those data structures are <usually> initialized by data
> acquired from a persistent data store. But the access of the data to do
> that initialization is quite independent of the problem solution.

Access may be irrelevant, but WHAT you get back is very relevant.

> IOW, both the problem solution and the data access are "the code". They
> are separated by logical modularization and decoupling to make the
> application easier to implement and maintain.
> I really don't know why this notion of separation of concerns is such a
> novelty.

It's not. You're ignoring, though, the preconditions to said separation, and reasons for it. I don't care whether the rest of the application "knows" whether or not the data comes from a network or local DBMS, or from local memory; what's critical is WHAT it is getting, and operating on. Most applications (perhaps all) can be better written against relations, because of what the RM gives them.

> Modularization has been a basic part of large scale software
> development since the '60s and there is nothing particularly OO about it.


> That's not the issue. Any time one makes /any/ change to an application
> there is a potential to insert a defect.

A meaningless statement.

> One reason one separates
> concerns is so that the insertion defects is isolated and limited in
> what can be broken.

If you separate a module, and both parts depend on the same thing, like a data structure, then a change in that data structure will break both parts. I could probably function with my stomach outside my body, with proper surgery, so that a bullet to my gut wouldn't damage my stomach. That doesn't make it a good idea, for obvious reasons.

> If you don't touch the problem solution code nor the interface it uses
> to access the data it needs, then you can be confident that you didn't
> break the solution logic.

Wrong. It depends on WHAT data it's getting.

> Then all you have to demonstrate is that the
> persistence access subsystem still provides the same data values it did
> before the change to it.

Values in what form?

> Again, this sort of modularization, decoupling, test management, and
> defect prevention is really basic software development stuff once one is
> outside the realm of CRUD/USER pipeline applications.

"Basic" does not imply easy. Decoupling and modularization are words that are easy to toss about without understanding their costs relative to benefits, and the contexts in which they apply.

Each line of code is separate; isn't that enough decoupling? I can replace "x = x + 1" with "x++" if I like, so isn't that line of code decoupled from those surrounding it?

> >> Codd's relational data model as implemented in RDBs is not a data
> >> storage paradigm?
> >
> > Indeed it is not.
> Wow. I give up. This disagreement is so profound I don't even know how
> to begin to respond.

Think about it for a while; it is true, and has interesting implications.

  • erk
Received on Wed Jun 28 2006 - 20:11:41 CEST

Original text of this message