Re: OO versus RDB
Date: 28 Jun 2006 11:11:41 -0700
H. S. Lahman wrote:
> The point is that data management and problem solving are quite
> different concerns and activities.
I don't think either one of these phrases is useful. "Problem solving"
is part of it, but a rather small part, in my opinion. Troubleshooting
existing systems might fall into this category. Analyzing requirements
is a big part of our work, involving logic and notations to help
identify gaps and inconsistencies. System design also involves logic
and notations, of different types; specifications tie it to the
requirements. And in coding, I'm doing more of the same: creating
formal structures which ultimately meet the requirements. Some
derivation, some creation, but problem solving? Not really, at least
not for me and developers I've ever worked with.
"Data management" requires data, which have specifications (expressible
as types and constraints). Those specifications are relied on by
multiple systems sharing the database. But other than that, the term
"data management" is extremely fuzzy.
"Data management" requires data, which have specifications (expressible as types and constraints). Those specifications are relied on by multiple systems sharing the database. But other than that, the term "data management" is extremely fuzzy.
> I didn't look because it is irrelevant to the discussion. The issue is
> not about finding an alternative to SQL for accessing an RDB. (Though
> clearly if one is using flat files it is not the best choice.) It is
> about hiding /whatever/ access mechanism one uses.
> However, outside CRUD/USER processing the solution to a particular
> problem will almost always have a different representation of the data
> than the DBMS so that it can manipulate the data optimally for the
> solution in hand.
I've never heard anyone discuss "problems" so much. Do you mean applications? Typically you don't have a huge number of disconnected "problems" accessing the same data. You have applications which have been enhanced based on business needs accessing the data.
Whether these "problems" have different "representations of the data" is irrelevant to whether they should have, or need, different representations; but even more critical is that the canonical form be reliably specified, and offers flexible operators. Given these specifications and operators, why should we water them down within applications?
> The persistence mechanisms should be completely transparent
> to the problem solution in the same sense that a particular problem
> solution should not matter to the way data is managed on the enterprise
> In a non-CRUD/USER context I have a complex problem to solve. To do
> that I have to manipulate data in data structures tailored to my problem
> Those data structures are <usually> initialized by data
> acquired from a persistent data store. But the access of the data to do
> that initialization is quite independent of the problem solution.
Access may be irrelevant, but WHAT you get back is very relevant.
> are separated by logical modularization and decoupling to make the
> application easier to implement and maintain.
> I really don't know why this notion of separation of concerns is such a
It's not. You're ignoring, though, the preconditions to said separation, and reasons for it. I don't care whether the rest of the application "knows" whether or not the data comes from a network or local DBMS, or from local memory; what's critical is WHAT it is getting, and operating on. Most applications (perhaps all) can be better written against relations, because of what the RM gives them.
> Modularization has been a basic part of large scale software
> development since the '60s and there is nothing particularly OO about it.
> That's not the issue. Any time one makes /any/ change to an application
> there is a potential to insert a defect.
A meaningless statement.
> One reason one separates
> concerns is so that the insertion defects is isolated and limited in
> what can be broken.
If you separate a module, and both parts depend on the same thing, like a data structure, then a change in that data structure will break both parts. I could probably function with my stomach outside my body, with proper surgery, so that a bullet to my gut wouldn't damage my stomach. That doesn't make it a good idea, for obvious reasons.
> If you don't touch the problem solution code nor the interface it uses
> to access the data it needs, then you can be confident that you didn't
> break the solution logic.
> Then all you have to demonstrate is that the
> persistence access subsystem still provides the same data values it did
> before the change to it.
Values in what form?
> Again, this sort of modularization, decoupling, test management, and
> defect prevention is really basic software development stuff once one is
> outside the realm of CRUD/USER pipeline applications.
"Basic" does not imply easy. Decoupling and modularization are words that are easy to toss about without understanding their costs relative to benefits, and the contexts in which they apply.
Each line of code is separate; isn't that enough decoupling? I can replace "x = x + 1" with "x++" if I like, so isn't that line of code decoupled from those surrounding it?
> >> Codd's relational data model as implemented in RDBs is not a data
> >> storage paradigm?
> > Indeed it is not.
> Wow. I give up. This disagreement is so profound I don't even know how
> to begin to respond.