Little design mistakes that can be easily avoided (3): Qualifying as relation *any* predicate

From: Cimode <cimode_at_hotmail.com>
Date: 28 May 2007 07:21:03 -0700
Message-ID: <1180362063.288024.100250_at_o5g2000hsb.googlegroups.com>


I may have named this new theme: *On the impossibility of defining intuitively relations without a refining process meeting all requirement of 2VL*(but the title would have been quite hard to digest). It seems to me as one of the recurring design errors originates from a lack of methodology to safely differentiate what could and what could not qualify as a
relation.

On most cases and as far as developper's community is concerned, the process of relation formulation is inherently arbitrary and subjective. We may notice that relations are expressed *de facto* and are not the consequence or outcome of some *refining* process that starts at the formulation of a predicate and ends with the acceptance of a relation. In other words, not all predicates may serve as a basis for a relation and several traps
should be avoided before getting to the point where one can safely assume, one is actually working with a relation.

Because such process is crucial and determines the true opportunity for the designer to apply 2VL finally establishes wether designer are whithin the boundaries of RM or not. In the same line of thougth, I also observed that people often define by their constraints and not sufficiently by their predicates but that's another topic.

So I will skip the theme of constraints for now to focus on the theme of relation formulation

It is well known that the initial requirement for formulating a relation is to express a predicate for which all existing tuples should be propositions that can be validated by TRUE and all nonexisting  tuples should necessarily be propositions validated by FALSE. Therefore, one possible not too hazardous methodology for formulating a relation could be as follow (on a one relation/one segment of reality basis):

--> 0) Name the segment of reality we want to capture. Name also the
possible relation representing of reality we want to capture. It would be safe to assume that something we can't even name may hardly be considered a segment of reality.
--> 1) Formulate the predicate that can reasonnably * capture the
basic meaning of (0)
--> 2) Formulate possible propositions using the predicate (1): One
could see these possible propositions as *Tuple candidates*.
--> 3) Over 2), isolate the propositions that are facts (propositions
validated by TRUE) with no doubt.
--> 4) Verify 4) against stakeholders **: Set up an interview to
present to people concerned by the segment of reality one wants to capture, and observe their response. Double check that these propositions are *immediately* understood and do not trigger subjective anthropomorphic responses. Usually people's first comment when confronted with TRUE simple propositions is 'SO?'(Yes TRUE propositions tend to be boring). When people start reacting in anthropomorphic manner to these proposition, that's one clue they are either complex or poorly expressed. A simple proposition should somehow trigger a *neutral* response. If one fails in 4)then getting back to 1) is safer
--> 5) Over the set of *tuple candidates* isolate the one that are
reasonably non facts (proposition validated by FALSE)
--> 6) See if 5) may be a possible proposition in the future. If 6)
fails then getting back to 1) or 2) is safe.
--> 7) Over the set of tuple candidates isolate the one that are
neither in 3) nor 4) Answers such as *MAYBE* are the clue that a predicate does not truly respond to 2VL. If 7) is not an empty go back to 0)
--> 8) Make sure that 4) possibly exists out of stakeholder's
perception. If one has legacy data to migrate then see if all existing data actually meet 3) If not getting back to 1) actually is safer

*: reasonnably meaning that it can be expressed and understood by common intelligence people speaking the language in which the predicate ought to be formulated.
**: Users supposed to know best the segment of reality, one attempts to capture

In summary, in this framework, not less than 4 steps can cause one to realize that a relation is poorly expressed! That says a lot about the rest of the process: if one starts to actually implement constraints on something that does not possibly qualify as a relation, then the chances are that one tries to manipulate something out of RM 2VL.

So let's apply the method above.

Suppose we are to formulate an EMPLOYEE relation to capture the segment of reality consisting of keeping track of the LAST NAME, DOB and SALARY for a specific EMPLOYEES. We are in a company where people are actually formulating that BILL GATES has decided to buy to control his H/R costs. In the last memos received by the employees, it has been mentionned that the company ought to build a database which will keep all these information in a centralized manner to reduce costs.

0) We name the segment of reality we want to capture: a structure to keep track of EMPLOYEES the NAME, DOB and SALARY for a specific

EMPLOYEE - the possible relation is named EMPLOYEE

  1. Possible predicate for 0): (#NAME) was born on (#DOB) earns (#SALARY) (As one can see #NAME, #DOB, #SALARY are value placeholders from walues extracted from domains. Once these 3 placeholders are filled with specific values, they become propositions)
  2. Possible propositions for predicate 1)
EMPLOYEE: (JOHN FORD)  was born on (1/1/1965) earns (60 K$)
EMPLOYEE: (MATTHEW CONLEY) was born on (1/1/1962) earns (80 K$)
EMPLOYEE: (BILL GATES) was born on (1/1/1961) earns (1500 K$)
EMPLOYEE: (GEORGES SINCLAIR) was born on (1/1/1961) earns (XXX K$)

As we can see the tree first tuples can be validated by TRUE. OTOH, the fourth tuple is more troublesome.

3) Isolating the facts --> As you can see, I can not knowing for SURE that BILL GATES whoc is already employed by MICROSOFT (or about to

take his retirement) may become an existing part of the segment of reality I am trying to describe. So I consider that he is *not* a

part of the propositions validated by TRUE.(Which leaves 2 propositions validated by TRUE beyond reasonnable doubt)

EMPLOYEE: (JOHN FORD) was born on (1/1/1965) earns (60 K$) --> TRUE
EMPLOYEE: (MATTHEW CONLEY) was born on (1/1/1962) earns (80 K$) --> TRUE 4) Checking the facts/ I go to the H/R dept and present to Ms.HOPKINS the 3 above propositions or even better let her make them/ If Ms. HOPKINS response was *SO?*, I would be happy to move forward. But her response is *I don't know , all employees do not tell us their birth date because of privacy law*. Even though a designer may be tempted to ignore that comment, a designer ought to emit the possibility that at this point the relation he chose is not a relation! (I know that's frustrating). So I apply the method and I gently (yeah right) get back to 1)!

  1. As I know my predicate did not work, I consider now issuing a safer predicate *not* based on DOB. My new predicate is

(#NAME) earns (#SALARY)

2) Possible propositions

EMPLOYEE: (JOHN FORD) earns (60 K$)
EMPLOYEE: (MATTHEW CONLEY) earns (80 K$)
EMPLOYEE: (BILL GATES) earns (1500 K$)
EMPLOYEE: (GEORGES SINCLAIR) earns (XXX K$)

3) Isolating the facts

EMPLOYEE: (JOHN FORD) earns (60 K$) --> TRUE EMPLOYEE: (MATTHEW CONLEY) earns (80 K$) --> TRUE

4) I get back to Ms.Hopkins and present to her the new propositions. She know tells me *Isn't what we talked about that before. SO?*.

Happy I move forward.

5) I identify the proposition that is currently validated by FALSE (BILL GATES is not YET a member of the unified company but may be someday)

EMPLOYEE: (BILL GATES) was born on (1/1/1967) earns (1500 K$) --> FALSE 6) From the description, I may assume that BILL GATES may appear as part of the future system. Therefore I may move one to next stage.

7) Over the propositions I identified

EMPLOYEE: (GEORGES SINCLAIR) earns (XXX K$)

We can see that I can neither validate as TRUE nor I can validate by FALSE. Therefore that proposition does not allow to state that the EMPLOYEE as defined may belong to 2VL!!! All these efforts for this. I need to get back to the initial step and basically formulate a new predicate. (Grumbling....) So I shall emit a new predicate after all...

0) New relation : Instead of formulating the EMPLOYEE, I use a little shortcut stating that I amke a difference between RECORDED_SALARY_EMPLOYEES (one that have their SALARY recorded for sure. That differentiates them from UNRECORDED_SALARY_EMPLOYEE)

(#NAME) earns (#SALARY) contitutes a RECORDED_EMPLOYEE (yeah I used a trick, I consider that all employees which do not have their known SALARIES are in fact RECORDED_EMPLOYEES)

  1. Because I defined a new relation, I reuse the previous predicate

(#NAME) earns (#SALARY)

2) Possible propositions : I now do not consider BILL GATES as a part of the tuples

EMPLOYEE: (JOHN FORD) earns (60 K$)
EMPLOYEE: (MATTHEW CONLEY) earns (80 K$)
EMPLOYEE: (BILL GATES) earns (1500 K$)

3) Isolating the facts

EMPLOYEE: (JOHN FORD) earns (60 K$) --> TRUE EMPLOYEE: (MATTHEW CONLEY) earns (80 K$) --> TRUE

4) I get back to Ms.Hopkins and present to her the new propositions. She almost kicks me out of her office *SO?*. (Hope I am not wrong this time)

5) I identify the proposition that is currently validated by FALSE (BILL GATES is not YET a member of the unified company but may be someday)

6) From the description, I may assume that BILL GATES may appear as part of the future system. Therefore I may move one to next stage.

7) Over the propositions I identified that ALL propositions are either TRUE or FALSE. NO MAYBE type of proposition over RECORDED_EMPLOYEE. I can move to something else (I should validate that with Ms.HOPKINS but I don't dare going to her area anymore so I do the next best thing : I get my hand on the some of the data I want to pour into the RECORDED_EMPLOYEE structure)

8) I find that the data is spread out among several databases. I evaluate that a simple migrating script may actually migrate all data into that relation and that none of the tuples would disqualify the current relation.

OOOUF --> I end up now with one possible relation named RECORDED_SALARY_EMPLOYEE. Of course I will have to deal with UNRECORDED_SALARY_EMPLOYEE. But I may rely on this one for sure. Lots of tedious efforts for little ungratifying results (I get banned for 1 month from H/R dept as disruptive). I may now move forward to the next stage which is identity and integrity constraints.

Looking at this method, one may scream: it's HELL (and it is) but the real point here is to demonstrate that a relation dos not come naturally to mind as one may think. The respect of 2VL into defining relation is the result of a lengthy process of refining before even getting at guaranteeing identification, or integrity constraints.

The above method of course is purely arbitrary and perfectly subjective but gives an idea of the level of complexity necessary to deal with defining a relation. And most of all, because of the two failed attempts at defining the relation, demonstrate that all predicates do not always lead to a certain relation...

For boring them to death, I apologize to the people on this board, as well as the H/R dept which I used as an example...

Regards...

Any thoughts welcome Received on Mon May 28 2007 - 16:21:03 CEST

Original text of this message