Re: TRM - Morbidity has set in, or not?

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Fri, 19 May 2006 17:09:45 GMT
Message-ID: <tZmbg.9852$A26.244608_at_ursa-nb00s0.nbnet.nb.ca>


Keith H Duggar wrote:

> Bob Badour wrote:
>

>>Actually, the point you are asking about above is more
>>about expression bias than access paths. Network and
>>hierarchic data models limit both by combining separate
>>concerns.

>
> [snip much more]
>
> Fascinating fascinating. Your response was enlightening,
> it really helped to clarify the issues for me. And this
> vocabulary of "expression bias" is also clarifying.

JOG and Jon Heggland recently used the term "query bias" here, which is essentially synonymous. I chose to use "expression bias" because some people tend to have limiting preconceptions regarding the term "query". (Myself included.)

I believe the absence of bias is known as symmetry.

What's far more clarifying for me is the discipline of separating concerns.

Consider the list of concerns you might have for your simulations: correctness, performance, security, failure recovery, resource use, concurrency, scalability

Some concerns are not entirely independent. Performance, concurrency, scalability and resource use generally involve design tradeoffs.

However, correctness, performance and security are all essentially independent concerns. One can consider each of them in complete isolation, and good designs will separate them as much as our tools allow.

Aspect oriented programming and literal programming both attempt to isolate separate concerns in source code. I suggest it is even more helpful to separate them linguistically/notationally/formally as well.

  I
> realize I have a lot of reading on fundamentals to do and
> I'm working on getting Date's AITDS; but, aside from that
> can you recommend any material that specifically addresses
> issues concerning expression bias and access paths?

I recommend Fabian Pascal's _Practical Issues in Database Management_ http://www.dbdebunk.com/books.html

>>The problem I perceived that you expressed above amounts
>>to: How the hell am I going to write that analytic
>>program in the first place?!?
>>
>>Because the concern for performance is mixed with the
>>oncern for correctness in network data models, one often
>>finds that--after the performance needs of one requirement
>>are met--it is extremely difficult to express anything
>>else one wants. Even after creating such a program, one
>>will encounter the same performance issues. If you have to
>>change the access paths for the other analysis, you are
>>pretty much screwed.

>
>
> Yes that's exactly the problem I was trying to communicate.
> I thought perhaps a relational perspective would help me to
> better design such programs. In the past such problems have
> irked me greatly. Are these design issues inherently hard?

I will have to give the consultant's answer: That depends. Design is more art than science and what one designer will find extremely difficult another will find rather obvious.

If one uses a tool that effectively separates concerns right out of the box, then separating concerns is relatively easy.

In university, I twice wrote the circuit simulation I gave you using lists. One of those times, the instructor required we use the unix m4 macro processor to add a type-safe parameterized list type to a language that doesn't support parameterized types.

As a programmer, the single most effective learning exercise one can do is learn a lot of radically different languages. Knowing pascal, modula 2, C, C++ and Java won't get you nearly as much bang for your buck as knowing pascal, C++, prolog, lambda, Lisp and APL.

During design, use the highest level languages and abstractions available to you and deal with each concern separately. Combine features from different languages if that makes things easier to express.

Consider making a prototype in the highest level language available to you. It might never handle 100 million bovines, but if it can handle a few hundred, you might be able to use it to verify the scalable simulation you write later.

Write the simulation and the analysis programs for the design.

After you have that done, you will have identified the higher-level abstractions you need to deal with and will have a head-start at decomposing your design. Then deal with the physical concerns like how you want to cluster the data and what access paths will improve performance.

You may find that the translation from the higher level language to the lower level language is fairly easy for the simulation, and would be easy for the analysis too if only your lower level language had one or two features of the higher-level language.

Consider adding those features either through clever use of the lower level language or by automating the code development using a macro language or scripting language.

> Bob Badour wrote:
>

>>Keith H Duggar wrote:
>>
>>>What about APL, Joy, K, and Prolog for example? What
>>>are their good and bad points from a relational support
>>>perspective?
>>
>>They are all first and foremost programming languages,
>>which makes them orthogonal to the RM ... In a sense,
>>"RM programming language" is an oxymoron.

>
> I admit I'm having trouble grasping this concept. I can see
> that various languages can be paired with various data models.
> However, doesn't a programming language need some glue, some
> contructs to support a particular data model? For example in
> your code example you had keywords such as FROM WHERE JOIN
> that seem semantically tied to the relational model and yet
> they are part of the language proper.

In a sense, the programming language is itself the glue. Different computational models have different strengths and weaknesses. For instance, it is very easy to create very complex state machines with object oriented languages. In fact, one of the distinguishing characteristics of a bad OO programmer is a proliferation of useless state in the form of variables with excess scope or persistence.

You will often hear people speak of "impedance mismatch". Many people think the way to address that problem is to lower the data language to the level of existing programming languages. Date and Darwen created Tutorial D to demonstrate that the exact opposite is the correct way to deal with the problem. The solution is to raise the level of discourse of the programming language to match that of the data language.

Thus, in a valid D, relations are inherent to the programming language and to its computational model. Restrict and join are operations in that computational model. Received on Fri May 19 2006 - 19:09:45 CEST

Original text of this message