Path: dp-news.maxwell.syr.edu!spool.maxwell.syr.edu!drn.maxwell.syr.edu!news.maxwell.syr.edu!elnk-pas-nf1!newsfeed.earthlink.net!pd7cy1no!shaw.ca!pd7tw1no.POSTED!53ab2750!not-for-mail
X-Trace-PostClient-IP: 24.81.152.157
From: paul c <toledobythesea@oohay.ac>
User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)
X-Accept-Language: en-us, en
MIME-Version: 1.0
Newsgroups: comp.databases.theory
Subject: Re: Normalisation
References: <c84875b1c20dd7a6409bb6359bd8a007@localhost.talkaboutdatabases.com> <PYCwe.133261$Sh1.7196835@phobos.telenet-ops.be> <7b0d3346c1251a60be0ada7868e198a2@localhost.talkaboutdatabases.com> <Vqgxe.134972$Ql.7257205@phobos.telenet-ops.be> <MPG.1d307df5fe83be4b9896c8@news.ntnu.no> <jfwxe.135559$Sl.7259257@phobos.telenet-ops.be> <MPG.1d30f54239ed91709896ce@news.ntnu.no> <iCCxe.135818$lx6.7272566@phobos.telenet-ops.be> <MPG.1d31f8b919b06d209896d0@news.ntnu.no> <ZaSxe.136291$Ql.7257205@phobos.telenet-ops.be> <MPG.1d32476ba73ac8229896d4@news.ntnu.no>
In-Reply-To: <MPG.1d32476ba73ac8229896d4@news.ntnu.no>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 110
Message-ID: <anfye.157493$El.71359@pd7tw1no>
Date: Mon, 04 Jul 2005 18:39:02 GMT
NNTP-Posting-Host: 64.59.144.75
X-Complaints-To: abuse@shaw.ca
X-Trace: pd7tw1no 1120502342 64.59.144.75 (Mon, 04 Jul 2005 12:39:02 MDT)
NNTP-Posting-Date: Mon, 04 Jul 2005 12:39:02 MDT
Organization: Shaw Residential Internet
Xref: dp-news.maxwell.syr.edu comp.databases.theory:31991

Jon Heggland wrote:
> In article <ZaSxe.136291$Ql.7257205@phobos.telenet-ops.be>, 
> jan.hidders@REMOVETHIS.pandora.be says...
> 
>>>How do you define atomicity?
>>
>>Roughly, I would say that it means that all your data manipulation can 
>>be expressed as operations over a signature that consists only of (1) 
>>domains described by an abstract data type and (2) relations that 
>>contain in their fields only values from these domains.
> 
> 
> I can't make heads or tails of this. Operations over a signature? 
> Domains described by an abstract data type? Is a domain not the same as 
> a data type? Can a data type describe several domains? And again with 
> the "abstract"---does your atomicity require that all data types be 
> abstract? What, then, is a concrete (or non-abstract) data type, and why 
> is it not allowed? Are relation types not (abstract) data types? How 
> about set types?
> 
> 
>>An unnest operation, for example, cannot be described in that way.
> 
> 
> What if I define an unnest operator that "unnests" strings in a 
> corresponding manner?
> 
> Anyway, let me rephrase my initial question.
> What is an atomic datatype/domain?

i may look foolish for jumping into this thread so late, but here's what 
i think anyway.  it just seems to me that the RM itself defines very 
little which leaves many things it doesn't define.  so my knee-jerk 
reaction is to say that what's not in the RM shouldn't be implemented in 
the RM part of a RDBMS.  i'd say it must be understood that when people 
talk of atomic things, they mean atomic as far as the RM is concerned, 
not necessarily as far as some other definition or concept is concerned, 
such as a postal address or infinity.  so there has to be some 
'conceptual separation' that is preserved once we try to 'get physical' 
(i'm tempted to underline the word 'separation', but i don't know how!)

based on this, i'd say that an 'atomic datatype/domain' is one whose 
values the RM is able to relate (ie. perform whatever operations are 
defined to be relational) to values of the domain or other domains 
without knowledge of the structure and operations of the domains 
involved.  in this sense, the domain is a kind of 'black box' to the RM.

in practice, there must be some agreed protocol between the RM and the 
party that does have such knowledge.  that party could be machine code 
or human interaction, although i think most people would expect it to be 
code.

the protocol would be needed most obviously so that the RM would know in 
what form, for example what bit configuration, to store values or 
present them to the party that does understand those values in a way 
that doesn't undercut consistent behaviour in the future.  i'd say this 
means that the RM cannot itself even decide equality.  it should 
theoretically always have to inquire of some outside party whether two 
values are 'equal'.  (if there is any respected definition of the RM 
that does claim to define equality, i'd like to see it.)

from an implementation point-of-view, this could mean that domain 
support might make use of the RM without the original RDBMS even 
'knowing' about it.

without being deeply familiar with any so-called RDBMS's, i'll stick my 
neck out and state that most implementations have probably made the 
mistake of not separating the relational operators from those that are 
peculiar to a domain.  i'd say part of the reason has been fear of poor 
performance.

the term 'scalar' wrt domains has shown up in recent years.  i gather 
that that term is intended to remove certain kinds of context from 
domain values, such as relations such as coordinate systems they might 
have with each other.  if the term makes sense, then to me it suggests 
that a RDBMS might not even store a particular value, rather some 
surrogate.  perhaps 'place-holder' is a better term.  the RM part of a 
RDBMS wouldn't be any the wiser if that were done (assuming there is 
some protocol for presentation).

if this makes any sense, then i'd say that when we think about 
implementations, the purely relational support ought to be the smallest 
part of a system and would not need much 'maintenance' apart from 
optimizations that occur to people after they have implemented the RM. 
the domain support work might be endless, just like newsgroup posts 
about nulls.

> What is a non-atomic datatype/domain?

wouldn't it be one that the RM part of the RDBMS is able to encode or 
decipher or somehow apply without help?

> What bad happens if you allow a non-atomic datatype in a relation?

i think this question really means what bad thing happens if a relation 
has a datatype that the RM doesn't have some protocol for presenting to 
another party.  the risk is that the RDBMS will then try to do things 
itself with the domain.  if that happens, the advantage of the theory 
may be lost (the advantage being the same as any theory - being able to 
predict behaviour without duplicating it, witness FD's and tables with 
empty columns, among many others).  the domain support should rely on 
its own theory whatever that theory is - for example, number theory, 
regular patterns etc.

having said that, when it comes to implementation, RVA's do seem to cry 
out for recursion, because the RM already has the operations to enable them.

just my current opinion, not trying to ram it down anybody's throat.

p
