Re: native xml processing vs what Postgres and Oracle offer

From: rpost <rpost_at_pcwin518.campus.tue.nl>
Date: Wed, 26 Nov 2008 09:30:46 +0000 (UTC)
Message-ID: <ggj506$ifq$1_at_mud.stack.nl>


patrick61z_at_yahoo.com wrote:

>I remember an interesting read a while ago by a threaded newsreader
>author and the bottom line is the author first worked with the "in
>reference to" part of the headers (References:) and then the subject
>line and posting time, just due to the fact that there were so many
>newsreaders that just one method wasn't going to cut it. While in
>theory, using the references header you could rebuild the tree (as the
>references would accumulate using the replied to articles list of
>references), in practice usenet is subjected to any number of news
>clients some being better than others.

Good point; however, there is a specification (NNTP, RFC 977 and 1036) of the protocol, which implicitly contains a 'physical design' of the data structures used, in the form of requirements on message headers. The misbehaving newsreaders are *broken*.

From a relational perspective, the protocol spec should indeed have been preceded by a separate logical design. In the NNTP design, the decision was made to postulate the unique identifiability of messages regardless of their contents or other attributes; the alternative (which most here generally advocate) is to identify entities based on their attributes.

I never analysed large sets of USENET messages with this in mind, but it seems pretty clear to me that this alternative would indeed have been superior. E.g. assuming we can only post one messsage to an NNTP server at a time (as RFC 977 assumes), a message can be identified by a server identification (e.g. hostname) plus a timestamp. Requiring the presence and correctness of these two attributes on each message would have been a better decision, as far as I can see now, than requiring the presence and uniqueness of a message ID.

It would have created the problems of having to specify the permissible format and exact meaning of these attributes. E.g. may the server use its own local clock and its own date/time format? If it may, may it also reset its clock at any point in time? I suppose IDs are so popular because they allow this kind of detail to be avoided.

-- 
Reinier
Received on Wed Nov 26 2008 - 10:30:46 CET

Original text of this message