Re: Normalisation

From: paul c <>
Date: Mon, 04 Jul 2005 18:39:02 GMT
Message-ID: <anfye.157493$El.71359_at_pd7tw1no>

Jon Heggland wrote:
> In article <ZaSxe.136291$>,
> says...

>>>How do you define atomicity?
>>Roughly, I would say that it means that all your data manipulation can 
>>be expressed as operations over a signature that consists only of (1) 
>>domains described by an abstract data type and (2) relations that 
>>contain in their fields only values from these domains.

> I can't make heads or tails of this. Operations over a signature?
> Domains described by an abstract data type? Is a domain not the same as
> a data type? Can a data type describe several domains? And again with
> the "abstract"---does your atomicity require that all data types be
> abstract? What, then, is a concrete (or non-abstract) data type, and why
> is it not allowed? Are relation types not (abstract) data types? How
> about set types?
>>An unnest operation, for example, cannot be described in that way.

> What if I define an unnest operator that "unnests" strings in a
> corresponding manner?
> Anyway, let me rephrase my initial question.
> What is an atomic datatype/domain?

i may look foolish for jumping into this thread so late, but here's what i think anyway. it just seems to me that the RM itself defines very little which leaves many things it doesn't define. so my knee-jerk reaction is to say that what's not in the RM shouldn't be implemented in the RM part of a RDBMS. i'd say it must be understood that when people talk of atomic things, they mean atomic as far as the RM is concerned, not necessarily as far as some other definition or concept is concerned, such as a postal address or infinity. so there has to be some 'conceptual separation' that is preserved once we try to 'get physical' (i'm tempted to underline the word 'separation', but i don't know how!)

based on this, i'd say that an 'atomic datatype/domain' is one whose values the RM is able to relate (ie. perform whatever operations are defined to be relational) to values of the domain or other domains without knowledge of the structure and operations of the domains involved. in this sense, the domain is a kind of 'black box' to the RM.

in practice, there must be some agreed protocol between the RM and the party that does have such knowledge. that party could be machine code or human interaction, although i think most people would expect it to be code.

the protocol would be needed most obviously so that the RM would know in what form, for example what bit configuration, to store values or present them to the party that does understand those values in a way that doesn't undercut consistent behaviour in the future. i'd say this means that the RM cannot itself even decide equality. it should theoretically always have to inquire of some outside party whether two values are 'equal'. (if there is any respected definition of the RM that does claim to define equality, i'd like to see it.)

from an implementation point-of-view, this could mean that domain support might make use of the RM without the original RDBMS even 'knowing' about it.

without being deeply familiar with any so-called RDBMS's, i'll stick my neck out and state that most implementations have probably made the mistake of not separating the relational operators from those that are peculiar to a domain. i'd say part of the reason has been fear of poor performance.

the term 'scalar' wrt domains has shown up in recent years. i gather that that term is intended to remove certain kinds of context from domain values, such as relations such as coordinate systems they might have with each other. if the term makes sense, then to me it suggests that a RDBMS might not even store a particular value, rather some surrogate. perhaps 'place-holder' is a better term. the RM part of a RDBMS wouldn't be any the wiser if that were done (assuming there is some protocol for presentation).

if this makes any sense, then i'd say that when we think about implementations, the purely relational support ought to be the smallest part of a system and would not need much 'maintenance' apart from optimizations that occur to people after they have implemented the RM. the domain support work might be endless, just like newsgroup posts about nulls.

> What is a non-atomic datatype/domain?

wouldn't it be one that the RM part of the RDBMS is able to encode or decipher or somehow apply without help?

> What bad happens if you allow a non-atomic datatype in a relation?

i think this question really means what bad thing happens if a relation has a datatype that the RM doesn't have some protocol for presenting to another party. the risk is that the RDBMS will then try to do things itself with the domain. if that happens, the advantage of the theory may be lost (the advantage being the same as any theory - being able to predict behaviour without duplicating it, witness FD's and tables with empty columns, among many others). the domain support should rely on its own theory whatever that theory is - for example, number theory, regular patterns etc.

having said that, when it comes to implementation, RVA's do seem to cry out for recursion, because the RM already has the operations to enable them.

just my current opinion, not trying to ram it down anybody's throat.

p Received on Mon Jul 04 2005 - 20:39:02 CEST

Original text of this message