Re: Does Codd's view of a relational database differ from that ofDate&Darwin? [M.Gittens]

From: Alexandr Savinov <savinov_at_host.com>
Date: Tue, 07 Jun 2005 12:00:39 +0200
Message-ID: <42a57059$1_at_news.fhg.de>

erk schrieb:
> Alexandr Savinov wrote:
>

>>Assume that we have a set of 3 values S = {1, 3, 10}. We want to
>>aggreage them and apply some function func: A = func(S). Do we have a
>>problem? No. Now remove some item from the set so that we have S = {1,
>>3} and then apply again the aggregation function. Do we have a problem? No.

>
>
> Incorrect - you may have a problem. You're treating S as a variable,
> whose "contents" can vary from time to time. A domain, however, isn't a
> variable. It's a set whose definition (whether intentional or not) is
> fixed. A function over a varying domain, such as you describe,
> represents the sort of situation meant for relational constraints.

I am sorry but I do not see any problem. Another point is that variable is something that stores a reference. This reference may point to
- one object (record),
- a collection of records which is dynamically defined (for example, a result of some query),
- a table/domain which is statically defined in the schema - anything else that can be represented by reference So you are right that domain is not a variable but reference to a domain can be stored in as many variables as we like just like we can store in variable references to records or result sets. If store in a variable a reference to a collection then this collection can be defined by using different mechanisms by collection is collection and it always has some set of internal elements (actually they may change in time, for example, if we delete some rows from a table). But all this has nothing to do with nulls or I do not understand you point.

In my example variable S is a collection defined explicitly in the current scope by enumerating its elements. But it might be a result of some query or its might be a static table. It has no influence on the semantics of NULLs we are discussing. I want to say that the semantics of NULL has to be defined as absence of thing (not in relational model where it hardly makes sense). If we define null as absence then as a minor advantage we avoid problems with aggregation because absent things are simply not visible, i.e., null values are skipped. Essentially this precisely what I wanted to illustrate in my example.

>>Having null values is actually a way of removing data items from
>>consideration. In this example we apply the aggregation function to the
>>set {1, 3} which is equivalent to applying it to the set {1, 3, null}.

>
>
> Not really. Is null to be counted, for example?

No, and this precisely what I wanted to emphasize. Because you cannot count what does not exist. The problem is to understand that different things may exist in different dimensions. And again, it is not relational model (forget about it for a while - otherwise it is not possible to understand what null means and many other important things too).

In order to understand
- how different things may exist hierarchically and multidimensionally, - how the model may have canonical semantics (for example, we can compare two models if they are semantically equivalent or one of them is more specific than another),
- what is the dimensionality of the model (how many degrees of freedom it has),
- how grouping and aggregation works,
- what null means.
you might want to read about the concept-oriented model. But as a said ouy need to forget for a moment about relational model because if you always project new terms and concepts onto your good old coordiante system you will always get a wrong view.

>>Some difficulties may appear in multidimensional case (in the case of
>>many columns). What if a row has null in field F1? This means that this
>>object does not exist along the dimension F1. If we project all rows
>>onto this dimension then we will not be able to find it there - it is
>>absent. In particular, aggregation functions and other procedures will
>>not see it at all (if it does not exist then it is not visible).

>
>
> How about conditional tests on those attributes?

>>It is possible but I do not find it very natural because we need the
>>properties of NULLs and aggregations to be consistent with other
>>properties of the model being developed. We cannot say "let's do it so"
>>- but need to have a kind of global consistency.

>
>
> So all "objects" need to be addressable by all predicates? I think
> that's a nonsense. What's the point, when a simple clause like is2D(x)
> can properly "distinguish"?

>>For example, take a row
>><1, 3> and then consider this point in 3-dimensional space by adding one
>>new dimension. How it will look like (represented)? I find it very
>>natural to write it as follows: <1, 3, null>. This actually says that
>>this object does not exist in this dimension, it is not visible, it
>>cannot be counted or aggregated.

>
>
> It's not a 3-D point, so why even consider it? If it doesn't exist in
> the "dimension" of 3-D points, why even mention it? Are "objects" like
> "Love" and "hate" both written <null, null, null> because they have no
> "projection" into 3-D space?

Yes, "Love" and "hate" has to be written <null, null, null> if - these three variables do not make sense for them (say, Colour, Weight, Size)
- these objects do not exist, do not exhibit themselves, along these dimensions,
- we do not want to see them along these dimensions

Whatever we call such a behavior it has very concrete formal semantics in the concept-oriented data model and in any other model where we want to have such useful properties as dimensionality and canonical semantics.

>>We might add some other properties of
>>nulls and then derive their consequences. And finally we will develop
>>yet another data model.
>>
>>Formally, objects exist in all dimensions but in most of them they have
>>null values.

>
>
> Null==absent? Why? Or rather, why bother?

There several reasons but all of them relate to informal properties of "good" model. Simple answer is "in order to avoid practial problems and disputes about the meaning of null". When we give such a definition then we get a very simple and effective data model (along with other principles).

I am sorry, but the only thing I can advice is to read something about concept-oriented model. It is difficult because the description is bad (I actually lost an interest to it because almost everything is clear for me).

-- 
alex
http://conceptoriented.com

Received on Tue Jun 07 2005 - 12:00:39 CEST

This message: [ Message body ]
Next message: Tony Andrews: "Re: Does Codd's view of a relational database differ from that ofDate&Darwin? [M.Gittens]"
Previous message: Paul: "Re: theory and practice: ying and yang"
In reply to: erk: "Re: Does Codd's view of a relational database differ from that ofDate&Darwin? [M.Gittens]"
In reply to erk: "Re: Does Codd's view of a relational database differ from that ofDate&Darwin? [M.Gittens]"
Next in thread: Tony Andrews: "Re: Does Codd's view of a relational database differ from that ofDate&Darwin? [M.Gittens]"
Reply: Tony Andrews: "Re: Does Codd's view of a relational database differ from that ofDate&Darwin? [M.Gittens]"
Reply: erk: "Re: Does Codd's view of a relational database differ from that ofDate&Darwin? [M.Gittens]"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message