Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]

From: Alexandr Savinov <savinov_at_host.com>
Date: Tue, 07 Jun 2005 12:35:24 +0200
Message-ID: <42a5787e$1_at_news.fhg.de>


erk schrieb:

> Alexandr Savinov wrote:
> 

>>You describe the semantics of UNKNOWN which is also a special "value".
>>And I completely aggree with your interpretation until you call it NULL.
>>
>>Generally, in order to effectively manipulate data semantics we might
>>need two special "values" with concrete semantics defined at the system
>>level:
>>
>>NULL - absence (of value, record or any other entity).
>>UNKNOWN - presence but having unknown identifier.
> 
> 
> FYI, this sort of thing has been rehashed many times, most of them
> somewhat unsatisfactorily. The fact that data manipulation can be done
> without nulls makes their value primarily that of placebos, while their
> costs can be great, as they're hard to make consistent.
> 
> 

>>Here is one example of using nulls:
>>Attribute/property/variable does not make sense for an object. requently
>>for the sake of schema simplicity we have wide tables where some columns
>>do not makes sense for some objects.
> 
> 
> How is that simple? Your table (it's not a relationa) now has many
> possible meanings. Keep in mind that a relation is a predicate, not an
> "entity." Otherwise how would you characterize a "linking" /
> "cross-reference" relation?

By schema simplicity I meant the number of tables. Sometimes it is really simpler to have one wide table than numerous specific tables. There is alway trade off between these two extremes and do not say that having one wide table is good practice - it is only an example for demostrating some properties of nulls.

Tables are not entities in relational model in other conventional models (and it is one of fundamental drawbacks). However, they should be treated as entities at least formally (in a "good" data model). Indeed, what are tables and why need to define them as a special kind of things different from entities? It is a violation of Occams's rasor principle. I want to have everything treated as only one sort of things. For example, why tables do not have their own fields where I might store some properties? In order to understand why do we need tables and what is their role consider the following example. Instead of one wide table for all types of objects we are tought to introduce several more special tables with only a smaller subset of columns. Such tables will have less rows. However, then we might want to inroduce even more specialized tables and so on. Finally we get a large number of tables each with a relatively small number of rows. Interestingly, we can continue this process and get tables with no rows at all! Thus theoretically we can express our semantics without rows by using only tables. One simple conclusion is that "data semantics can be expressed by tables rather than only by rows (in tables)". It is very strong and somewhat surprising result. In such an approach we do not need rows because anything can be modelled by means of tables. Such a model is ugly from relational point of view (although we simply followed an advice to decompose tables) however it is interesting from theoretical point of view (with big consequences). Manipulating data in this model means adding/remove tables rather than records and one table bears some semantics which is encoded in its static name. Thus tables in good model should be treated as normal entities but with special role so that we can always choose where to store our semantics - in table entity or in record entity. Of course, such a model is more complex and more interesting but I just wanted to comment your comment that tables are not entities.

Yet, again, you touch an issue which has nothing to do with null values and their semantics.

>>For example, we have a list of
>>products (mostly cars) and define a column Power. However, sometimes we
>>also sell products which do not have a motor so what should we write in
>>the field Power? This is precisely the case where the object does not
>>exist in this dimension and we can formally correctly write NULL.

> 
> 
> A bad example. "Mostly cars"? Why not define the car and (other
> product) relations properly?

I deliberatly gave such an example!
However, in real world complex applications we (unfortuanately) have a
lot of such situations which cannot be avoided. In other words, it may well happen that for some record some attribute simply does not make sense (including bad design).

Theoretically it is not an issue of good or bad design. The problem is if we want to keep meaningless columns in one table or optimize representation. I mean that independent of our desire or schema any object in the database will formally have all dimensions. The question is only how we *represent* the database semantics. In one case null is written explicitly. In other cases it still exists as a value of an object along meaningless dimensions but is not explicitly written because of good schema. For database and for the model this object still has null as a value along this dimensions independent of how it is represented.

>>Unknown values are complex in processing and can be used only in
>>sophisticaed databases like deductive databases. The unknown "value"
>>actually means that this field has one normal value but it is unknown,
>>i.e., everything is possible. In this case it is very natural when the
>>result of aggregation is also unknown. For example, if we have a new
>>employee then he is guaranteed to have some concrete salary which is
>>however currently unknown for some reason (no paper arrived the
>>accounting department or the salary is still negotiated). In this case
>>we could formally corrected write UNKNOWN. This would inhject a great
>>deal of uncertainty into the whole database so that many related queries
>>will return also UNKNOWN (in the case the database is able to process
>>such values at all).

> 
> 
> And that's precisely the reason such things are better dealt with
> through normalized relations. Then you can get consistent and correct
> answers, by specifying exactly what you want: average salaries for
> those who have them, count of those without salaries. The joins, far
> from being a burden, allow you to be specific without have to work
> around however the DBMS has "defined nulls."

That is a (serious) problem in relational model because depending on chosen schema you may get different results. However, in good model normalization does not change the data semantics - it is only an issue of where and how it will be stored. You can store everything in one wide table or you can create specialized tables but all queries must return one and the same result. This is how concept-oriented model works where normalization is not an issue. The model has some dimensionality and data in it has some semantics and the question is what structure for dimensions we prefer - it is a design issue which has nothing to do what results will be returned by our queries. If data semantics depends on the structure of its relations, how you use joins and how you use nulls then it is bad model.

-- 
alex
http://conceptoriented.com
Received on Tue Jun 07 2005 - 12:35:24 CEST

Original text of this message