Re: some information about anchor modeling

From: Eric <eric_at_deptj.eu>
Date: Thu, 17 Jan 2013 23:26:14 +0000
Message-ID: <slrnkfh24m.it4.eric_at_teckel.deptj.eu>


On 2013-01-17, vldm10 <vldm10_at_yahoo.com> wrote:
> In this thread I gave examples, which show that the surrogate key is
> a bad solution. These examples show that the missing data can not be
> solved by applying surrogates. One such example, I wrote in this thread
> on 18/07/2012:

Saying it more than once doesn't make it any more true, but at least this time it is a simple, coherent chunk. I use coherent as in "hanging together", not as in "making sense".

> Let R be Relvar, which has the surrogate key K and three properties
> A,B,C,

OK

> Let us suppose that it is somehow possible to decompose the
> above relvar into binary relvars using the "RM/T discipline".

Last July you had quotes around 'somehow', and tried to imply that there was no method available to do this. You gave no evidence for that, and in fact you are wrong. It is possible as a purely logical operation on the relvar, with no semantic information needed. If it is so difficult, how did you manage to produce the result below?

> Let the following example be one possible situation:
>
> K K A K B K C
> --------------------------------
> k1 k1 a1 k1 b1 k1 c3
> k2
> k3 k3 c3
> k4 k4 c3
> k5
> k6

This is a perfectly reasonable example of the solution for a particular case. We need to assume that A, B, and C are the only properties we consider it sensible to record for a real-world object, and also that the real-world objects are all members of some class, e.g. "road vehicles", or "people".

If the values in (K) are really surrogate keys, the presence of k2, k5, and k6 in (K) is incorrect because we have no information whatsoever about the objects referred to. Also, only one of k3 or k4 should be present in any relvar, because they are indistinguishable. So, in fact, we have only k1 and (arbitrarily) k3.

Note: I am using "object" rather than "entity" because the latter

      has a technical meaning that is too close to the subject under
      discussion. The actual choice of word is not at issue except that
      using "entity" bothers me.

> The above decomposition is very bad. For instance, there is the question:
> how will a user find the real world entity that has the attribute
> C = c3 and the surrogate key K = k3?

All the user can do is observe real-world objects in the appropriate class, and look for one that has [C = c3] but neither [A = a1] nor [B = b1]. If this turns out not to be a simple procedure (as seems likely), then the only possible conclusion is that we should have recorded more properties for our objects, i.e. that our design is flawed. However, it is not flawed because we have used surrogate keys, but because we have recorded insufficient information to identify real-world objects.

At this point you, Vladimir, may be thinking "but that's what I said". However, you seem to have wrongly assumed that the surrogate is actually A substitute for having a natural key usable for identification, then attacked the concept because it doesn't do real-world identification. Thereby you have missed the point. A surrogate key is an artificially created alternate key because the available natural keys are too complex for our database system to use as keys. They may be too complex because the necessary property list for the natural key is too large, or has alternative or optional parts, or for other reasons.

> Note that a surrogate key is only in the database, it is not in the
> real world.

Well, yes, that is rather the whole point of a surrogate key.

> So, my point here is that the surrogate key makes this table so bad that
> it becomes not an acceptable design.

An incorrect point. See above.

> I did not want to show all the bad consequences of the use of surrogate
> key in this example, because I think that this example is enough. But
> apparently it's a good idea to show some other disadvantages that result
> from the use of surrogate key.

OK, but...

> 1. For example, there is the question: A data entry person
> should enter the attribute b7, into the corresponding binary
> relation. This binary relation has a surrogate key whose value is
> k3. How will the data entry person find the corresponding binary
> relation? Obviously, with a surrogate key, there is no solution
> to this problem.

Finding the right binary relation is easy, it is (K,B). The user's problem is to find the right value for K to use in the new row. This is also easy, it is the intersection of all the K values found by querying other binary relations for known properties of the real-world object.

> 2. How will one make m-n relationships with entities in the above
> example? Obviously, a surrogate key is not solution to this problem.

I can see no problem here, just use a new binary relation (K,K).

> 3. How would one apply the update, and delete operation on the data
> from the mentioned binary relations?

This is just number 1 again - find the proper K value and use it!

> Suppose someone wants to solve the above problems by applying Codd's
> Three-Value logic.

Why would they want to do that? You don't need three-valued logic to construct the binary relations, and once you have them you don't need it for data manipulation either.

> This approach is not correct because the above problems are not a matter
> of logic; they are a matter of database design. For example, if in the
> above example, we apply VINs (Vehicle Identification Numbers) instead
> of surrogates, then we will not have the problems mentioned above.

No form of logic is any help if your database design is inadequate. To that extent I think we may even agree. You need to stop getting stuck on surrogates because you have missed the point of what they are and what they are for.

As for the VIN, I have tried to tell you before, it is just someone else's surrogate key exported into the real world, and it is not even usable in all possible applications about motor vehicles.

Perhaps I will find the time to write another post that is the true story of a surrogate key (with nothing to do with motor vehicles).

> Note that Lukasiewicz and Kleene use Three-Valued Logic before Codd.

Was this the same three-valued logic, and even if it was, is this in any way relevant (unless you merely wish to discredit Ted Codd)?

> Codd uses nulls as a method of representing missing data in the relational
> model.

Perhaps he did, but considering the subsequent literature on the subject, is this even relevant any more? In any case I don't think it is relevant to this discussion.

Eric

-- 
ms fnd in a lbry
Received on Fri Jan 18 2013 - 00:26:14 CET

Original text of this message