Re: some information about anchor modeling

From: vldm10 <vldm10_at_yahoo.com>
Date: Mon, 1 Apr 2013 10:55:09 -0700 (PDT)
Message-ID: <1b1aa3af-14fc-4858-9f9a-70c86d172986_at_googlegroups.com>


Dana ponedjeljak, 11. veljače 2013. 08:41:14 UTC+1, korisnik Derek Asirvadem napisao je:      

> I dare say, that *if* Anchor Modelling databases do not suffer from the consequences of data and referential integrity loss, then they have a good handle on it as well.

Hi Derek,  

Surrogate key, that is, "Anchor Modeling" cannot even solve some major areas related to "history" of the database. In other words “Anchor Model” cannot solve many of the problems it intended to solve.

1.
On November 29, I posted in my thread “The original version” on this user group the following example:

The following example shows that this model cannot solve the most important of the cases for which it is intended. This is an example of wrongly-entered data (erroneous or wrong data). The question arises: why hasn't this been fixed in the new version? Rather, it has been explained unclearly. The answer is that AnchorModeling cannot solve the most important part.
Let me explain this with the following example: Let “Historized Attribute” be given (for entity Car).
Car ( id, color, start-date-for-color)


          22     Blue             16 August 2002
          22     Red              16 August 2002
In the first row the data entry person made a mistake and entered the wrong color “blue”, then the manager fixed the mistake and entered “red”, which is the actual color of the car. This cannot be done in AM because we get a double key, key = (id, date). For this reason wrong data in AM must be deleted. Of course, there is no history. This implies many other things. For instance, AM can’t be used to design online applications, as well as to solve other serious problems.
This example clearly indicates that the authors of AM are incompetent and fail to understand the nature of databases and the problems of changes and history.

I submitted complaints to DKE and Springer. Among other examples, I sent this example as part of my complaint. At the beginning of this thread I posted the answers that I received from P. Chen (DKE journal) and the response from the Springer journal.

Now I am going to analyze in more detail this example. Suppose that we have done business application that allows teachers to enter grades for students interactively, using a database that is supported on-line, that is, which can be accessed via the Internet. The company that sells software provides the following:

(a) It is impossible to do a crime using this database because our solution preserves the history of events;
(b) This is the first solution that provides 100% on-line operations. In the event that someone or some procedures or external user makes a mistake, this database will show who exactly made a mistake and show who needs to fix the error. The wrong data are not responsibilities of designers or theorists. The designers and theory should design db which shows who is responsible for the faulty data.
(c) Our solution solves a lot more, than it does the "Data Warehouse", because our database keeps everything what was been done in the database. In addition, this database has on-line support, so you can use the current data on the Internet or on a network. So we do not need “Data Warehouse” and we do not need to transfer data to the “Data Warehouse”.

Suppose that the schools purchased Anchor Modeling software because it maintains history. Suppose that the teacher Smith gave grade A to student John. By using a data entry screen, Smith entered the grade A in the corresponding database. He deliberately made a mistake when entering a grade A. John's realistic knowledge corresponds in fact to the grade D. After two months, John was admitted to a good college, thanks to grade A. Smith then declared that the grade A is a mistake and delete it, since the authors of Anchor Modeling allow the deletion of the wrong data. So everybody is OK. John didn't make a mistake; John’s college even does not know what is happened with his diploma. Smith made mistake and fixed it. The authors of Anchor Modeling are responsible in this case, because they claim that their solution maintain the history, although it is not correct. Even more they obviously do not understand what the history is. Obviously they do not understand the above-mentioned item (a), (b) and (c). Otherwise, if they understand the history, then they will not allow the delete of data. We note that history works well with the insert operation. However, if one is allowed to insert and delete, then it means that the update operation is also allowed. Now imagine that in the above example with schools, some manager is introduced the following rule: If the professor enters the wrong grade in the database, then such data can be repaired only by System Administrator, at the written request of the teacher. Now, we have a new problem, and this is who maintains written documentation. More importantly, now the benefits (a), (b), (c) do not hold in the database. Of course, it is possible organized crime with insider and outsider members.         

It may happen that the system administrator makes a mistake while entering data. It is possible that two professors erroneously entered grades, deliberately for one student. It is clear that many bad combinations are now possible with this database design.

Thus, deleting erroneous data as do the authors of Anchor Modeling shows that they do not know exactly what it is history. I'm not sure that editors of the paper "Anchor Modeling", which was published in the Springer and DKE journals, that they have complete understanding of these issues. In my opinion, this can be a bad experience for those who use AnchorModeling paper for their scientific work, a PhD, scientific presentations and the like. Note that Springer and DKE are very well known scientific journals. Of course Springer and DKE should be responsible. The application of this software for "history" can have dangerous consequences for complex enterprise applications such as banks, airlines, military applications etc. If you (or anyone else) think that I'm wrong here, then please explain it in this user group.

2.
Example 2 (December 22, 2010, tread “The original version”)

Anchor Modeling solves relationships using the following explanation: “When a transaction has some property, like PEDAT-PerformanceDate in Fig. 4, it should be modeled as an anchor. It can be modeled as a tie only if the transaction has no properties.” (See AnchorModeling, section 4.1 Modeling Core Entities and Transactions) However no scientific explanation for the above statement is given. In contrast to Anchor Modeling, many university lectures include definitions of relationships by their attributes - often giving examples for relationships with attributes. Wikipedia also defines relationships through attributes.

In fact, Anchor Modeling cannot solve a relationship that has an attribute.
Explanation: In this case the Historized Tie would have the following form: Htie (C1, C2, ..., Cn, A, T) where A is an attribute. Here it is not clear what T is - is T time that is related to a set of anchors (as it is in Anchor Modeling) or is T time related to A? Here arise problems with key. Note that a relationship can hold at arbitrary time, independently of attributes.



Let me do a small analysis of this example. In his paper “The Entity – Relationship Model” P. Chen wrote “Note that relationships also have attributes.” (See section 2.2) This statement by P. Chen is in contradiction with the following statement from the authors of "Anchor Modeling" : Relationships do not have attributes. This theory about relationships that was written in the "Anchor Modeling" I also stated in my complaint addressed to P. Chen. Databases with relationships are the most complex structures. I had the opportunity to work with very complex relationships problems. Therefore, I am very surprised with this “contradiction” approach to this problem.

3.
In historized Attributes Hatt(C,D,T), the attribute T is wrongly designed. T is the time when the value of a property is no longer valid and also the time that a new value of the property becomes valid. However, this is incorrect. For instance, let us consider the property: the color of a car. A car can be sent to the mechanic to be fixed. The old color can be removed right away and the company can enter this into a database. After a serious of repair jobs, the new color can be painted on 10 days after the previous color was removed. This data may be entered into the database 12 days after the car has been in the shop. Therefore, one of the design foundations in Anchor Modeling is flawed. There are numerous examples of business applications where entities’ attributes don’t work in this way.



Let me do a small analysis of this example. As we sow Anchor modeling supports only start date in H(C, D, T). Is it possible in Anchor modeling to store end date? No it is not possible in Anchor Modeling, because it is very bad db design. If you try to store both date, start date and end date then you will have duplicate key. In fact, it is not possible to work with end date in “Anchor Modeling”, at all.

4.
The authors of Anchor modeling have not shown how they work with “metadata”. In the first version of Anchor Modeling (section 2) they mention that “Although important, metadata is not discussed further since its use does not differ from that of other modeling techniques.” This is a very important theoretical part. There is no example, definition or schema of “metadata”. There is absolutely nothing about “metadata” in the scientific paper which is about fundamental results. Of course this is not a scientific way. As I already wrote the authors of the "anchor modeling" had an intense discussion about the "metadata" on their website and repeatedly announced fixes and improvements. You can see my paper “Some ideas about a new data model”, example 2.5 at http://www.dbdesign10.com . Note that this example has more than one solution.



Surrogates

5.
Technical solution that is known as a "surrogate key" is bad for the database that maintains "history", because it is possible that two (or more) different surrogates identify one entity. This is very bad.

6.
Global industry-standard identifiers are a large group of the business applications that don’t need surrogate keys at all.

7.
There is a large group of objects from the business applications that can not be resolved by using the surrogates. This is the example about a Honda dealer who sells Honda cars, which all have the same attributes. This is interesting case because we can’t apply Leibniz Law. Here we can not use surrogates, because they would show the same entities in a database. Therefore, we must introduce the VIN. Again, it is obvious that if we use the VIN identifier, then we do not need the surrogate key, at all.

8.
The surrogate keys are not externally verifiable.

9.
Locally verifiable keys are applied in local environment, for example in a company, city, library, public transportation card, local shop card etc. They do not need the surrogates, at all. Instead of the surrogate key, an identifier is better solution.

10.
Identifiers that provide good db design. This example shows how identifiers can hold the entire database structure and provide good db design. Take for example auto service which provides repairs of cars. We can organize completely our business, by applying the database. For example, the manager of the auto service takes necessary data from a customer and the data about the car. Then the worker checks the car, and then he enters the necessary information about the job that is needed on the car as well as price and time. Then manager prints a sheet that contains all the information and the corresponding identifier. This identifier is very different from the surrogate key. This identifier belongs to the real entity; the identifier is placed on the print. This identifier holds the whole deal. The identifier also makes the auto service one complete little world. One copy goes to the customer; another copy goes to the worker etc. All the information about cars, customers, repairs and prices go into database.

This database is a fully functional, it allows that this technician can present in documented manner, what is needed to be done, how much it will cost and when it will be over. The database provides the customer with all the necessary information so that he can accept or reject the job. Auto service has all the information necessary for the operations of the company, financially part, material and parts, accounting, information about the customer, car, worker etc. This identifier can be physically present on each paper that is related to the job. Obviously we do not need a surrogate key here. Note that the surrogate key has no functionality that has this identifier. In this kind of databases, there is no place for a surrogate key. Often people who work with databases forget that a man makes these little worlds. I put this example with intention to show how bad the surrogate key for database design is. Another reason why I put this example is the importance of database design. This is the most fundamental step in working with databases. In this small example, we see, in this example, that the constructed identifier is better than a surrogate key. Database design means the construction of a small world that is fully functional for what is intended. The database design is the most important step, so technical solution as it is the surrogate key, should be in compliance with the db design. It is the fundamental mistake if technical solution determines the database design. Note that the surrogate key is a kind of technical solution, it is similar to index and its work is unknown because it is part of the system.

11.
My approach to the memory and keys I explained in my post from February 25, 2013. I gave the following heading to this topic:

How a database stores an object and how a database can remember of its objects?

In my post from February 25, 2013, I showed you the main steps of the algorithm which is related to a memory and remembrance. You can see my solution with the identifier of the state, to see the complexity of the problem and of the solution. Certainly this is a serious issue and that is clear the surrogate key has no theoretical importance. As I said the surrogate key is a technical solution, it is a kind of index. If you look carefully 5,6,7,8,9,10 cases that I described in this post, it is clear that the surrogate can be applied in a small number of cases, say, less than 2%.

On the other hand, my solution is based on the states of the entities and relationships. I define the state of an entity as a total knowledge about the entity. I think my solution is theoretically and practically at a much higher level. When we talk about the theoretical level, I would like to note that the state of an object is one of the most complex topics in the history of science. These questions are about the states of an object, the identity of the object that is changed, the semantics of the subject that generates knowledge about the entity that is changing and the construction of the corresponding concepts. Also it is in fact the generation of present, past and future. Obviously, Codd did not even notice these things; his thinking is at the level of the indexes. The same level has the “Anchor Modeling”, but they use the theory about the history. I have developed and built the complete theory of the "history". The main results that are related to the "history" I have posted on this user group in 2005. So here in Paragraph 11, I want to say that my first objection is that the theoretical level of RM/T and Anchor Modeling in fact is of the low significance. I want to clearly say that my second objection is the following: Instead of creating a theory, now it anticipates creation of theory which others might discover, and thus in this way in advance occupy the space in theory. This index-duplicate-key is presented as something the most important and it is now tied for my theory that solves the "history". My theory about the “history” is the only thing here that is important.

My database, in addition to the maintenance history, is also building a small world in which "everyone knows all others." In this world there are two important events - when something is created and when something has ceased to exist. This world knows of existing entities, the former entities, and procedures that all this creates. In my opinion, this is what Codd is not understood, nor are the authors of Anchor Modeling understand it.

It is absurd that someone is using the surrogate key instead of large-scale systems such as barcode, driver license, passport identification, credit card, VIN, User ID in the network and tousends and tousends of complex systems, which in addition to the primary key, implement complex identification, verification, authentication, and more. So these identifiers can do all what do the surrogate key, with the addition of many more, what surrogates can not do.

As far as I know, Microsoft also uses the states of the entities in their EDM data model, but note that I defined the states in 2005. As for the Microsoft EDM data model and my solution, see my post "The priority of the idea" that I posted on December 3, 20009, on this user group to the next page:

https://groups.google.com/forum/?fromgroups=#!topic/comp.databases.theory/rkhDdJ8XD_o   

Vladimir Odrljin Received on Mon Apr 01 2013 - 19:55:09 CEST

Original text of this message