Re: So what's null then if it's not nothing?

From: Alexandr Savinov <spam_at_conceptoriented.com>
Date: Fri, 25 Nov 2005 17:59:34 +0100
Message-ID: <43874309$1_at_news.fhg.de>


JOG schrieb:

> Alexandr Savinov wrote:
> 

>>David Portas schrieb:
>>
>>>Alexandr Savinov wrote:
>>>
>>>
>>>>The first question: Is this design (4 tables) right price for having no
>>>>nulls? Do nulls deserve such an urgly design?
>>>>
>>>
>>>
>>>I'll ignore this one because it's entirely subjective as well as
>>>irrelevant.
>>>
>>>
>>>
>>>
>>>>The second question:
>>>
>>>[snip]
>>>
>>>
>>>>- determining object type (i.e., to what table it has to be added in
>>>>4-table design) is equivalent to determining if it may have nulls in the
>>>>1-table design.
>>>>
>>>
>>>
>>>Yes, and constraints to check this are I assume part of the data model
>>>in either case. How we handle constraint / type violations is
>>>implementation-defined. It's not a model issue.
>>>
>>>
>>>
>>>
>>>>The third question: Has the 4-table design less information? Obviously,
>>>>no. It has 4 times more tables that are expensive resource.
>>>
>>>
>>>You are confusing logical and physical models again. The point of the
>>>data model is to separate the two. Relvars (NOT the same thing as
>>>tables) are not an expensive resource because cost is
>>>implementation-defined. Nothing in RM says these 4 relvars must be
>>>implemented as separate data structures. Possibly it is desirable to
>>>implement them as a single data structure but that does not change the
>>>data model at all - they are exposed in the data model as separate
>>>relations. This is the heart of what RM is about.
>>>
>>>
>>>
>>>>These tables
>>>>have to be linked somehow because these objects are not isolated and an
>>>>object from one table represet an entity from another table. In other
>>>>words, each entity is distributed among many tables and we need to keep
>>>>track of them via links. So we add additional columns. But why we then
>>>>tried to remove null columns if then we add even more auxiliary columns?
>>>>
>>>
>>>
>>>Nothing in your hypothetical problem suggests it is necessary to "link
>>>the tables" whatever that means (there are no "links" in RM)
>>
>>If we do not link them then we get even worse design. The thing is that
>>all those objects with 2 properties (colur and weight) are thought of as
>>one type entities. (Problem formulation: there is a set of objects with
>>2 attributes that can be absent.) In the case of no links all those
>>entities will be distributed among 4 tables. If we want to produce a
>>report then we need somehow to combine the results from those 4 tables
>>manually. The system is unaware that they contain objects of one type
>>(in the sense of the problem domain).
>>
>>The design with links could help as follows. We store all objects from
>>the problem domain in one table (independent of whether they have nulls
>>or not). This table stores objects that have both attributes nulls.
>>Actually it stores only identities with no properties (in our case). If
>>some object has a non-null attribute then it is also placed in the
>>corresponding extension table. So we have an entity distributed among
>>many tables (a common practice).
>>
>>This design may help in some situations and can be not very good in
>>other situations. But what I meant is that having more tables (in order
>>to reduce the number of nulls) is an overhead of having more links. So
>>you win in one place (nulls) and loose in another place (links between
>>tables and the necessity to maintain distributed entites). Note that you
>>have also the problem of processing the distributed entiteis.
>>
>>
>>>>You simply follow some dogma and do not want to think yourself. In many
>>>>cases it is really desirable to try to avoid nulls. In other cases they
>>>>can be tolerated. It is always a trade off. What I wanted to say is that
>>>>
>>>
>>>
>>>Trade off implies you get some benefit in return for the complexity,
>>>compromises and errors that nulls create. I don't see that here. Nulls
>>>are used in today's databases because we use SQL or similar systems
>>>that demand them - not because of any intrinsic merits.
>>
>>So you would choose the second design in the form of 4-tables in order
>>to represent objects with 2 attributes that may be absent? I would
>>choose the first design with only one table. Yes, I will have nulls. But
>>is it a big price for having no head aches with maintaining 4 tables and
>>generating complex queries. For example, I can produce aggregationi
>>queries almost trivially. The database knows the semantics of nulls and
>>makes everything correctly. Because nulls are a legal mechanism with
>>concrete formal semantics.
>>
>>I think here you are confused two issues:
>>- nulls as a modeling construct,
>>- (many) tables as a means for imposing structural constraints.
>>In other words, having less tables or more tables is an issue that has a
>>little to do with nulls. Nulls are only one factor or criterion that can
>>be used to make a design decision about that. However, the most
>>important criterion is how can we represent our entities: by means of
>>tables or by means of records. Nulls by themselves is a construct that
>>deals with data semantics, namely, with existence or non-existence. We
>>can talk about nulls even in the context of only one table. In the
>>context of many tables it is easier to show consequences and
>>relationships with other mechanisms.
>>
>>
>>--
>>http://conceptoriented.com
> 
> 
> You are arguing at cross ends. If I have to fomrulate complex queries
> to access 4 tables (n.b. links are nothing to do with this discussion,
> they are again a higher level concept), and its going to be a huge
> logistical pain, then yes, I may use Nulls as a hack, and I will get
> the job done. And your point is?

Nulls are not a hack. It is a normal mechanism. One way for denoting absent things. In a good query language the problem of nulls should not appear at all. Currently the problem arises because of attribute-value framework where nulls is used as value although it is not a value - it is a mark to denote absence. In particular, comparisons and operations with nulls do not make sense. For example, a database might use NULL to denote reocrds that have been deleted. Such records are then effectively invisible for all queries as the whole entity. Obviously, we are not comparing them with other entites or making other operations. The same with NULLs in slots except that the entity is marked as non-existing along only one dimension while it may exist along other dimensions.

> That proves nothing except that contemporary DBMS offer insufficient
> (and mathematically incorrect) support for an all too common
> situation. The RM does not (and should not) talk about what to do if
> your data doesn't fit into the schema that _you_ (incorrectly) told it
> was going to. Rather the management system should be doing the work.

But as far as I understand (may be I am wrong) the problem of NULLs appeared in RM and most of disputes proceeded within RM independent of implementations in any DBMS. Then existing incorrect implementations are simply a consequence of theoretical uncertainty in this issue. Since there is no clear opinion how to treat nulls, companies implement them as they like and add even more inconsistency and uncertainty.

-- 
http://conceptoriented.com
Received on Fri Nov 25 2005 - 17:59:34 CET

Original text of this message