Re: OOP - a question about database access

From: H. S. Lahman <h.lahman_at_verizon.net>
Date: Tue, 04 Nov 2003 20:33:08 GMT
Message-ID: <3FA80D09.8070500_at_verizon.net>


Responding to Novoa...

>>>You probalby meant system instead of application. 
>>
>>No, I meant /application/.

>
>
> Then you don't use the concepts properly. A GUI application is an
> application, a DBMS is another application, a data management
> specialized application. When they work together they form a system.

A GUI application is an application whose UI is a GUI. A DBMS application is an application that handles persistence through a DBMS. An application (from the Computer Desktop Encyclopedia) is a specific use of a computer, such as billing or payroll. Thus the same application can employ a GUI, a web interface, a command line interface, or heliographs for the UI. Similarly, the same application can deal with persistence through flat files, RDB, OODB, or stone tablets. The GUI and DBMS are just specific implementation mechanisms that are services to the application solution.

>
>

>>>Persistence mechanisms are encapsulated by the DBMS, they are
>>>transparent to application developers.
>>
>>And how does the application solution access the DBMS?  It has to pick a 
>>mechanism like SQL, ODBC, an Oracle engine API, or whatever.

>
>
> You are calling mechanism to everything. SQL is not a mechanism, it is
> a language.

So? Why can't a language be a mechanism? ATLAS is a language that provides a mechanism for unambiguously conveying electronic test requirements. An OOPL is a mechanism for implementing an OOA/D model. Assembly language is a mechanism for implementing 3GL specifications. SQL is a specific mechanism for implementing a persistence access service.

>>Both that 
>>access and the DB schema needs to be decoupled from the application 
>>solution for anything beyond RAD/USER pipeline applications.

>
>
> What is beyond RAD applications?

RAD, as it is currently applied, is only used for pipeline applications between DB and UI. It is just a special case of translation where full automatic code generation was quickly achieved because both UIs and RDBs are very narrowly defined and highly standardized. So the applications that are beyond the current generation of RAD are all those where UI and persistence are secondary considerations.

[General purpose translation is only now coming of age with initiatives like OMG's MDA to provide standardization of the semantic meta models. However, full automatic code generation is now available for all OO software applications, not just USER pipelines.]

>
>

>>>>That is, the problem solution does not care 
>>>>if data is stored in an RDB, an OODB, flat files, or clay tablets.
>>>
>>>
>>>The problem solution does not care, but it is essential for the
>>>developers to know if the application is communicating with an RDBMS,
>>>a file processor, the OS file system etc.
>>
>>Only the developer providing the persistence access subsystem or layer 
>>within the application needs to understand that stuff.

>
>
> The persistence subsystem is a part of the DBMS. Tables are persistent
> by default, but you can create temporary tables if you want.

No, this has nothing to do with temporary tables. The persistence access subsystem or layer is part of the application. It provides a customized mapping between the view of the application solution and the persistence view.

When the application problem is simply to display views of the data, then the mapping is essentially 1:1 and one has a pipeline application.   In those situations one can provide automation through the infrastructures that you have been talking about so there is effectively no need for a subsystem to provide the mapping in the application. (More precisely, the RAD IDE provides it behind the scenes.) But when the mapping is not 1:1, it must be provided as a persistence access subsystem in the application.

>>In particular, 
>>whatever optimization is required to alleviate the persistence 
>>bottleneck should be encapsulated away from the application solution. 

>
>
> Yes, in the DBMS. The DBMS manages the physical persistence mechanisms
> like the hard disks.

Your second statement is true, but that's not what I am talking about.

The application determines the data needs and, consequently, the load on the DBMS resources. Since the DBMS resources are typically shared, it is important for each application to access those resources in an efficient manner. The DBMS cannot be prescient about the data needs of a specific application but the application can understand <general> performance constraints on DB access to provide thins like anticipatory caching geared to the application's /actual/ needs rather than the DBMS' guesses.

This sort of optimization is not about disk reads and writes. The DBMS understands the hardware technology so it is responsible for that part.   The issue here is the optimization of the overall DBMS resources.

>
>

>>Whoever is developing the solution to the business problem should focus 
>>on that solution and be completely indifferent to data schemas, access 
>>mechanisms, and I/O optimization.

>
>
> How can a data management solution be indiferent to the data
> representation manipulation model used?

It can't be. But we are talking about /applications/ that solve customer problems. Middleware and DBMS applications are very much concerned with data management, but that is a very specialized arena. The vast majority of applications exist to solve customer problems where the nature of persistence is, at best, secondary.

>>>>One may use OO to implement the persistence access subsystem or layer, 
>>>>but that view will be quite different than the business view.
>>>
>>>One may use OO to implement DBMSes. They must be general purpose
>>>DBMSes independent to any business problem.
>>>
>>>A good DBMS is expensive to build so in most cases you should use an
>>>existing one.
>>
>>Of course.  But that doesn't mean the application needs to know which 
>>one is being used or be affected if one changes one's mind later and 
>>swaps in another.

>
>

> True. Standard data management languages and communication protocols
> are for that, but they are not really standard at practice. Although
> entry level SQL-92 is widely supported.
>
> With a careful design an SQL DBMS could be replaced by another
> whithout a high cost. But it is not very frequent. I worked for a
> company which changed the application development language 6 or 7
> times, but the DBMS was always the same (Oracle).

It can only be replaced by another RDB and then, as you agree, it can be painful to do so because every vendor has proprietary extensions to SQL.   But what if the application has to move to accessing flat files for performance reasons? Or one needs to shift to a memory mapped OODB? As soon as one even thinks about SQL one has already considerably narrowed the persistence options. The application solution won't be affected if there aren't any ubiquitous SQL calls littering the solution if they are encapsulated in a persistence access subsystem.

>
> Applications are replaced a lot more frequently than DBMSes.

True. But one reason is that too much legacy code would become broken if the DBMS were changed. The IT industry is inundated with legacy systems that were too painful to convert when the DB technology changed so wrappers were put around them so that they could access the data the Old Way. There are a lot of databases out there that have special programs to "replicate" the new DB format into a local DB that legacy applications can use. Aside from the wasted resources in the conversion copying a bunch of data the application may not need on a given execution, they have the added fragility of synchronization.

>
> If you have all your business logic in the centralized database, among
> many other advantages the applications are easier to replace.

The problem isn't replacing applications. It lies in keeping applications running when changing the business rules and making exceptions to them. Centralizing business rules in the DB is fine so long as one can guarantee they will never change and will always apply to all problem contexts. As soon as things start to change existing applications tend to break.

Certain rules, notably intrinsic data integrity rules, /need/ to be centralized. There are also some business rules where a change would be tantamount to changing the business the company was in and those /should/ be centralized. But most rules and policies that apply to specific business problem spaces are not set in concrete and have exceptions. In that case one needs to decentralize those rules and policies into applications dedicated to the relevant subject matter.

A payroll system needs to understand tax laws to compute deductions. A general ledger needs to understand tax laws to compute allocations. But the particular tax laws each captures are unique to their subject matter. If they aren't, then someone has screwed up in defining the applications' responsibilities within the financial system.

But if one tries to centralize both sets of laws in the DB as a common suite of business rules, the payroll and general ledger are going to start tripping over each other eventually.

>
>

>>>>As a result one can abstract persistence mechanisms in terms 
>>>>of paradigm invariants so that the classes are generic things like 
>>>>Table, Schema, Row, and Field
>>>
>>>But it is not an issue for the information system developer because
>>>persistence is solved by the DBMS behind the scenes. In many cases it
>>>is undocumented.
>>
>>You seem to be including some sort of data warehousing or data mining 
>>capabilities in your definition of DBMS that would allow abstract 
>>queries.

>
>
> Data warehouses are only a special kind of databases. Data warehouses
> share several characteristics, and specialized DBMSes can take
> advantage of the knowledge of such characteristics in order to achieve
> better performance. Nothing else.
>
> DBMSes should be programmable and extendable. Data mining is also
> querying.
>
>
>>For any DBMS I know of the schema had better be documented for 
>>it to be directly accessible via SQL or whatever.

>
>
> I don't find sense here.

If you are going to access data stored in a table, you need to know the table and field names, etc.

You can place a mapping function in between that will allow data to be accessed more generically based upon the client's needs. Data warehousing and data mining provide exactly those sorts of services (among others) at the middleware level. It can also be done with the sort of persistence access subsystem that I am talking about at the application level. But whatever one inserts between the application solution and the DBMS needs to have the DBMS properly documented or it can't do its job.

>
>

>>>>That application of invariants leads to things like JDBC/ODBC/SQL that 
>>>>largely hide the mundane details of the mechanisms through automation.
>>>
>>>JDBC/ODBC are bridges between the application and the DBMS.
>>>
>>>SQL is the communication language between the application and the
>>>DBMS, and it is also the database design language. SQL has three
>>>parts: structural, integrity and manipulative.
>>
>>But they are all infrastructures provided to decouple the application 
>>from direct communication with the DB engine.

>
>
> No, they are provided to communicate the presentation application with
> the DBMS.

Sure. But only if all the application does is pipeline between DB and UI. In IT that is, indeed, as fair sized market so those infrastructures are justified. However, I am talking about applications that actually solve unique problems for the customer by applying complex business rules as opposed to simply data management.

>
>

>>Using them restricts the 
>>application to very particular and often proprietary modes of 
>>persistence.

>
>
> DBMSes free application from the persistence issues. You simply must
> know that non temporary tables are persistent, and nothing else to
> say.

The DBMS itself represents a very specific choice about the persistence mechanism. Where is it written that one needs to persist data using the the relational model? Lot's of data before and since has been persisted without it.

>
> But who cares about the physical format the DBMS use?

Not the application, which is the point. The application also doesn't care whether a DBMS is used.

>
> If you talk about the communication layers like ODBC, ADO.NET, etc. It
> is rather easy to change one for the other. You only need to touch few
> code lines if any.

Right. And if you really believe that, I have a very nice bridge in Brooklyn that I can let you have real cheap.

However, the real point is /which/ code lines one touches. If they are encapsulated in a persistence subsystem, then one can be absolutely confident that the problem solution still works because that code was not not touched. But if the code lines are in the application solution, one cannot be sure it has not been broken without exhaustive testing.

[BTW, one of the nice things about disciplined application partitioning is that each subsystem can be fully tested functionally in complete isolation from other subsystems. So if it tested correctly once and the code was not touched, one doesn't have to test it again when another subsystem is modified.]

>
>

>>My issue is about decoupling at a higher level than the DB engine. The 
>>application solution needs to be independent of those assumptions.

>
>
> Independent to the data model?
>
> This is an absurd.

Why? So long as there is an unambiguous mapping between the data models used by the application solution and the persistence mechanisms, it doesn't matter what data model is used for persistence so long as it is properly encapsulated. The whole point here is that for complex applications the data model in the solution does never maps 1:1 to the data model in the DB.

>
>

>>One 
>>does that by encapsulating those persistence mechanisms within a 
>>subsystem or layer whose interface is independent of those mechanisms 
>>(i.e., it manifests the problem solution's view of the data).

>
>
> Again, the DBMS does that. DBMS exposes only logical data structures.
> The physical persistence implementation and the physical data
> structures are hidden to the users.

No it doesn't, as your second sentence admits. The DBMS exposes /its/ logical data structures. Those should be hidden from the application solution that has its own logical data structures.

>
>

>>One then 
>>takes advantage of the automation providing such infrastructures by 
>>using those sorts of paradigms /within/ the subsystem or layer.

>
>
> One then takes advantage of the automation and abstraction provided by
> the DBMS.

Exactly. Separation of concerns.

>
>

>>>>Bottom line: starting out with an RDB as a central part of the 
>>>>application
>>>
>>>System, not application.
>>
>>No, /application/.  In my world a system is usually comprised of 
>>multiple applications.

>
>
> Two applications are multiple applications, and you have at least two.
> The DBMS and the presentation application.

Not in my world. Both presentation and persistence are low level, peripheral services to the application solution. As a data point, the last application I worked on prior to retiring had 30 subsystems for 250 KNCLOC of code. Of those two(2) dealt with the UI and persistence. That left 28 subsystems that were developed completely independently of the UI and persistence.

Your model is the data pipeline sort of application. That is quite common in IT but rarely exists outside IT. Even within IT it is becoming less important as customers want to have data interpreted and processed in more complex ways rather than just displayed from a variety of perspectives with subtotals and graphs.

>
>

>> A corporate accounting group owns a Financial 
>>system but that system is composed of distinct applications like General 
>>Ledger, Accounts Payable, Payroll, etc.  Each of those applications will 
>>typically have several subsystems.

>
>
> Subsystems are systems, and the DBMS is part of the General Ledger
> system, the Payroll system, etc.

Not IME in the two Fortune 500s where I did time. The DBMS was an entirely separate application with an entirely different subject matter running on a different set of machines from the applications. The only place I see integrated DBMSes is in RAD IDEs like Delphi and Access that service the pipeline market.

>
>

>>>>is not going to be very helpful in understanding OO because 
>>>>a lot of the OO stuff will be hidden in the infrastructures like 
>>>>JDBC/ODBC and the tools the JDeveloper IDE provides.
>>>
>>>Agreed, it would be better to try to develop a non data centric
>>>system, like a game, a text processor, etc.
>>>
>>>You typically need very little from OO in order to develop a business
>>>information system: An SQL DBMS, a database design script and a client
>>>app developed with a RAD, draging and dropping components in forms.
>>
>>That's only true for a RAD/USER pipeline application.

>
>
> You can develop any business information system with this approach.

Probably so. I've also worked on a direct access DB engine implemented BLISS (i.e., not even built-in support for OS file system support), but I wouldn't want to do it again. If the RAD/USER model solves a central problem beyond USER operations, then I wouldn't want to be stuck with maintaining it.

[The BLISS engine was developed without the relational model and, AFAIK, it is still in use after 25+ years. The applications that accessed it were intimately tied to its view of persistence. They and it are still virtually unmaintainable.]



There is nothing wrong with me that could not be cured by a capful of Drano.

H. S. Lahman
hsl_at_pathfindermda.com
Pathfinder Solutions -- Put MDA to Work http://www.pathfindersol.com
(888)-OOA-PATH Received on Tue Nov 04 2003 - 21:33:08 CET

Original text of this message