Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]

From: Alexandr Savinov <savinov_at_host.com>
Date: Mon, 06 Jun 2005 18:33:00 +0200
Message-ID: <42a47acd$1_at_news.fhg.de>


Paul schrieb:

> Alexandr Savinov wrote:
> 

>>Assume that we have a set of 3 values S = {1, 3, 10}. We want to
>>aggreage them and apply some function func: A = func(S). Do we have a
>>problem? No. Now remove some item from the set so that we have S = {1,
>>3} and then apply again the aggregation function. Do we have a problem? No.
>>
>>Having null values is actually a way of removing data items from
>>consideration. In this example we apply the aggregation function to the
>>set {1, 3} which is equivalent to applying it to the set {1, 3, null}.
> 
> 
> Wouldn't it be "1 + 3 + unknown", say, which should be unknown also?
> 
> 

>>>Why not say then that all aggregates that involve a NULL return NULL?
>>
>>It is possible but I do not find it very natural because we need the
>>properties of NULLs and aggregations to be consistent with other
>>properties of the model being developed. We cannot say "let's do it so"
>>- but need to have a kind of global consistency. For example, take a row
>><1, 3> and then consider this point in 3-dimensional space by adding one
>>new dimension. How it will look like (represented)? I find it very
>>natural to write it as follows: <1, 3, null>. This actually says that
>>this object does not exist in this dimension, it is not visible, it
>>cannot be counted or aggregated.
> 
> 
> To me it says we know the x & y coordinates but at the moment the z
> coordinate is unknown. So if we are working with a geometric projection
> that collapses the z axis, we have perfect knowledge. But if we need the
> z coordinate, everything becomes unknown.
> 
> Paul.

You describe the semantics of UNKNOWN which is also a special "value". And I completely aggree with your interpretation until you call it NULL.

Generally, in order to effectively manipulate data semantics we might need two special "values" with concrete semantics defined at the system level:

NULL - absence (of value, record or any other entity). UNKNOWN - presence but having unknown identifier.

They can be formally defined via possibility distribution over a set of values where 0 means impossibility and 1 means that the value is possible (but not necessary). NULL is constant zero, i.e., no values are possible. UNKNOWN is constant one, i.e., all values are possible. If we have a normal value assigned to a variable then the distribution has all values zero except for a single point with value 1 which corresponds to the selected value. Using the mechansim of possiblity distribution (multidimensional and hierachical) we can also impose semantic constraints.

Here is one example of using nulls:
Attribute/property/variable does not make sense for an object. requently for the sake of schema simplicity we have wide tables where some columns do not makes sense for some objects. For example, we have a list of products (mostly cars) and define a column Power. However, sometimes we also sell products which do not have a motor so what should we write in the field Power? This is precisely the case where the object does not exist in this dimension and we can formally correctly write NULL.

Unknown values are complex in processing and can be used only in sophisticaed databases like deductive databases. The unknown "value" actually means that this field has one normal value but it is unknown, i.e., everything is possible. In this case it is very natural when the result of aggregation is also unknown. For example, if we have a new employee then he is guaranteed to have some concrete salary which is however currently unknown for some reason (no paper arrived the accounting department or the salary is still negotiated). In this case we could formally corrected write UNKNOWN. This would inhject a great deal of uncertainty into the whole database so that many related queries will return also UNKNOWN (in the case the database is able to process such values at all).

-- 
alex
http://conceptoriented.com
Received on Mon Jun 06 2005 - 18:33:00 CEST

Original text of this message