Re: some information about anchor modeling

From: vldm10 <vldm10_at_yahoo.com>
Date: Wed, 18 Jul 2012 07:20:11 -0700 (PDT)
Message-ID: <042e5332-a7db-428f-9758-743812daf13b_at_a7g2000pbm.googlegroups.com>


Why Is the Surrogate Key a Bad Solution?

First, we can note that a few database theories use the surrogate key as a basic construct. I will mention the Object Oriented Approach (abbreviated OOA), Codd’s RM/t and “Anchor Modeling”. Secondly, we can note that OOA and Codd’s RM/T do not apply any of the points (ii) a,b,c,d,e,f that I mentioned in my first message in this thread from 14 July. And this is very bad. OOA uses states of an entity but does not use states of relationships. Note that OOA do not maintain states and histories of states.
The concept of object identity in OOA is mostly about the object existence which is independent of its value. This concept is less about states.A key in the Relational Model is on the level of relation and a key in OOA is on the level of a database. In OOA every object has a system-supplied identity, which is surrogate key. (A key in the Relational Model is on the level of relation and a key in OOA is on the level of a database. In OOA every object has a system-supplied identity, which is surrogate key.)

Let me give some examples which show that the surrogate key is a bad solution.

  1. Two distinct objects can have the same state. Here, the problem is to find the two corresponding real world objects. But if someone, for example working with the VIN or ISBN, then this one will not have problems. This case shows that the surrogate key is very convenient for hackers, insiders, wrong entered data etc. Note also that in OOA two distinct objects can have the same internal state.
  2. Today, more than 90% of databases have an identifier which is a property of the corresponding entity. The VIN – Vehicle Identification Number is a good example. For these databases the following is true: a) This identifier is much better than the surrogate key. b) These databases do not have any reason to use a surrogate key.
  3. On page 183 of his last book, “Database Design & Relational Theory – Normal Forms & All That Jazz” C.J. Date gives us Example 6. I think that this example contains some misunderstandings and mistakes, but I will use a more general example of it to show that the surrogate key is a bad solution. (C. Date in this example writes about the redundancy)

Let R be Relvar, which has the surrogate key K and three properties A,B,C. Now, according to C.J. Date’s recommendation we will apply the “RM/T discipline”. Here, Date refers to Codd’s paper RM/T. This is the first mistake. The main goal of Codd in RM/T was to get a decomposition of a relvar into binary relvars. He was unsuccessful at this and was not able to show how this is done. So, there is the question of how C. Date got binary relations in this example.

However, let us suppose that it is “somehow” possible to decompose the above relvar into binary relvars using the “RM/T discipline”. Let the following example be one possible situation:

K         K    A          K     B         K    C
---      --------        ---------       --------
k1        k1   a1         k1    b1        k1   c3
k2                                        k3   c3
k3                                        k4   c3
k4
k5
k6

The above decomposition is very bad. For instance, there is the question: how will a user find the real world entity that has the attribute C=c3 and the surrogate key K=k4? Note that a surrogate key is only in the database, it is not in the real world. So, my point here is that the surrogate key makes this table so bad that it becomes not an acceptable design.

4. In RM/T, section 4, E. Codd introduced cases (1), (2), (3). If multiple databases using one and the same entity, and if every of these databases using surrogate key for that entity, then mentioned three cases are possible. If one do not know how to maintain a history of events and do not know how to decompose a relvar into binary relvar then maybe this one can try Codd's suggestion in section 4: "Database users may cause the system to generate or delete a surrogate, but they have no control over its value, nor is its value ever displayed to them.”
Note that the surrogate key in these cases is also a real problem. Regarding surrogates, E. Codd wrote: "The capability of making equijoins  on surrogates implies that users see the headings of such columns but not the specific values in those columns." (see section 4).

Obviously, this unusual kind of design, based on surrogates, in fact has a lot disadvantages.

Vladimir Odrljin Received on Wed Jul 18 2012 - 16:20:11 CEST

Original text of this message