# Towards a definition of atomic

Date: Fri, 1 Feb 2008 05:30:25 -0800 (PST)

Message-ID: <0b742d39-2a67-4ece-ab76-f78ebe36848b_at_m34g2000hsf.googlegroups.com>

AFAIK the conventional wisdom is that no absolute definition of atomic exists for domain types. Throwing caution to the wind, in this post I wish to conjecture a definition of atomic that, perhaps with some more effort at its formalisation, can provide some absolute meaning for a given attribute within a given RDB schema.

The examples are a little contrived, but are only meant to be illustrative.

Example 1:

"Einstein discovered the formula E = mc^2"

"Newton discovered the formula F = ma"

Example 2:

"Bill is a parent of { Mary, John }"

"Mary is a parent of { Don, Alex, Sue }"

In example 1, in Prolog we can define a predicate 'discovered' to represent the two facts as follows

discovered(einstein, eq(var("E"), prod(var("m"), pow(var("c"),num(2))))).

discovered(newton, eq(var("F),prod(var("m"),var("a")))).

In previous threads I have discussed how it is not possible to decompose the information in nested expressions into a set of propositions about the nodes without the introduction of node identifiers.

By contrast, in example 2 it is straightforward to map the two facts into five (by decomposing the sets of children) as follows

parent(bill,mary). parent(bill,john). parent(mary,don). parent(mary,alex). parent(mary,sue).

Firstly ISTM that a valid attribute decomposition must be *nontrivial *, and perhaps this could be formalised somehow using entropy (by saying that the new attribute(s) have less states available than the original attribute). Although I'm not sure exactly how to state this mathematically, I expect one would find general agreement on what a non-trivial decomposition means in practice.

Secondly (and this is where the examples are relevant), a valid decomposition must coincide with a defined bijection that maps a DB state in the original schema to a DB state in the new schema. This is where those node identifiers in the first example come to play, because they seem to be at odds with defining such a bijection. Putting it more simply, it seems that the node identifiers aren't functionally dependent on the original DB state. It is for this reason that one may claim that such a decomposition is unreasonable - in the sense of not achieving information equivalence as a set of propositions.

Continuing with example 2, note that no further decomposition allowing information equivalence is possible. For example, a person's name is represented as a string domain type, and this is atomic because any attempt at decomposing the string into its individual characters forces the introduction of additional identifiers. Received on Fri Feb 01 2008 - 14:30:25 CET