Re: So what's null then if it's not nothing?

From: Alexandr Savinov <spam_at_conceptoriented.com>
Date: Mon, 21 Nov 2005 18:11:14 +0100
Message-ID: <4381ffc4$1_at_news.fhg.de>


David Portas schrieb:
> Alexandr Savinov wrote:
>

>>The first question: Is this design (4 tables) right price for having no
>>nulls? Do nulls deserve such an urgly design?
>>

>
>
> I'll ignore this one because it's entirely subjective as well as
> irrelevant.
>
>
>
>>The second question:

>
> [snip]
>
>>- determining object type (i.e., to what table it has to be added in
>>4-table design) is equivalent to determining if it may have nulls in the
>>1-table design.
>>

>
>
> Yes, and constraints to check this are I assume part of the data model
> in either case. How we handle constraint / type violations is
> implementation-defined. It's not a model issue.
>
>
>
>>The third question: Has the 4-table design less information? Obviously,
>>no. It has 4 times more tables that are expensive resource.

>
>
> You are confusing logical and physical models again. The point of the
> data model is to separate the two. Relvars (NOT the same thing as
> tables) are not an expensive resource because cost is
> implementation-defined. Nothing in RM says these 4 relvars must be
> implemented as separate data structures. Possibly it is desirable to
> implement them as a single data structure but that does not change the
> data model at all - they are exposed in the data model as separate
> relations. This is the heart of what RM is about.
>
>
>>These tables
>>have to be linked somehow because these objects are not isolated and an
>>object from one table represet an entity from another table. In other
>>words, each entity is distributed among many tables and we need to keep
>>track of them via links. So we add additional columns. But why we then
>>tried to remove null columns if then we add even more auxiliary columns?
>>

>
>
> Nothing in your hypothetical problem suggests it is necessary to "link
> the tables" whatever that means (there are no "links" in RM)

If we do not link them then we get even worse design. The thing is that all those objects with 2 properties (colur and weight) are thought of as one type entities. (Problem formulation: there is a set of objects with 2 attributes that can be absent.) In the case of no links all those entities will be distributed among 4 tables. If we want to produce a report then we need somehow to combine the results from those 4 tables manually. The system is unaware that they contain objects of one type (in the sense of the problem domain).

The design with links could help as follows. We store all objects from the problem domain in one table (independent of whether they have nulls or not). This table stores objects that have both attributes nulls. Actually it stores only identities with no properties (in our case). If some object has a non-null attribute then it is also placed in the corresponding extension table. So we have an entity distributed among many tables (a common practice).

This design may help in some situations and can be not very good in other situations. But what I meant is that having more tables (in order to reduce the number of nulls) is an overhead of having more links. So you win in one place (nulls) and loose in another place (links between tables and the necessity to maintain distributed entites). Note that you have also the problem of processing the distributed entiteis.

>>You simply follow some dogma and do not want to think yourself. In many
>>cases it is really desirable to try to avoid nulls. In other cases they
>>can be tolerated. It is always a trade off. What I wanted to say is that
>>

>
>
> Trade off implies you get some benefit in return for the complexity,
> compromises and errors that nulls create. I don't see that here. Nulls
> are used in today's databases because we use SQL or similar systems
> that demand them - not because of any intrinsic merits.

So you would choose the second design in the form of 4-tables in order to represent objects with 2 attributes that may be absent? I would choose the first design with only one table. Yes, I will have nulls. But is it a big price for having no head aches with maintaining 4 tables and generating complex queries. For example, I can produce aggregationi queries almost trivially. The database knows the semantics of nulls and makes everything correctly. Because nulls are a legal mechanism with concrete formal semantics.

I think here you are confused two issues: - nulls as a modeling construct,
- (many) tables as a means for imposing structural constraints. In other words, having less tables or more tables is an issue that has a little to do with nulls. Nulls are only one factor or criterion that can be used to make a design decision about that. However, the most important criterion is how can we represent our entities: by means of tables or by means of records. Nulls by themselves is a construct that deals with data semantics, namely, with existence or non-existence. We can talk about nulls even in the context of only one table. In the context of many tables it is easier to show consequences and relationships with other mechanisms.

-- 
http://conceptoriented.com
Received on Mon Nov 21 2005 - 18:11:14 CET

Original text of this message