Re: XML Data Model

From: Anne & Lynn Wheeler <lynn_at_garlic.com>
Date: Wed, 08 Dec 2004 15:20:08 -0700
Message-ID: <m3brd44elj.fsf_at_lhwlinux.garlic.com>


Tom Hester <thester_at_metadata.com> writes:
> But, it seems to me that all of this is beside the point. XML is an
> interface language, not a data language. There is no data model for
> XML because it does not describe data (facts about the real world).
> Rather it describes how to pass text between processes.

as previously mentioned it started out as GML ... generalized markup language ...
http://www.garlic.com/~lynn/subtopic.html#sgml

and the initials GML are actually the three people at the science center
http://www.garlic.com/~lynn/subtopic.html#545tech

involved inventing in in 1969. They took "G", "M", and "L" ... and had to come up with something other than the people's last names.

it was part of an infrastructure for document formating .... and common reference at the time was markup languages to refer to rules for formating documents.

however, relatively early in the '70s ... GML tags started taking on the characteristics of attribute tags as opposed to markup tags (with some indirection where attribute tags where then given markup rules ... as opposed to giving markup rules directly to contents of a documents).

so a typically attribute tags was ":address." (original gml tag format, you see the transition to <address> brackets, later in the 70s with ISO standardization of SGML).

The issue was the original 1969 invention started out as a formating markup langauge ... but by the early '70s, the tags were in common use as information descriptors ... independent of the formating of the information.

... so something like 4th floor, 545 technology sq, cambridge mass then in XML semantics is formated like

:address.4th floor, 545 Tech. Sq, Cambridge, Mass,

and the semantics becomes

"4th floor, 545 Tech. Sq, Cambrdige, Mass" IsA "address".

So, the analogy in typical RDBMS, is possibly the data dictionary giving the field/column characteristic.

The ML-genre allows for hierarchy of constructs ... so there can be a large file where the whole thing might be a <document> and their are individual fields that are subsections of <document> ... like <address> ... where there is a relationship between a thing that is a <document> and has a characteristic of <address>.

The RDBMS analogy could be considered a single level hierarchy where there is a primary field that has relationships to other fields in the same table.

Lets say you have a RDBMS "document" column as the primary field/key ... in RDBMS ... the contents of the field is some identifier that might be used to distinquish a specific document from some other document. In the ML paradigm ... what follows the field <document> ... is the actually document ... as opposed to an identifier for selecting the document (which you might find in a RDBMS paradigm). In both ML and RDBMS ... the contents of "address" tends to be the actual address.

So one might claim that in ML ... the contents of the thing marked by the tags are the actual things (i.e. the actual document, the actual address, etc). In RDBMS ... the fields might be something that represents the actual thing (i.e. a document sequence number that might be used to finding the document someplace) or it might be the actual thing (like an address).

So ... lets take an XML document that starts with a tag <document> and in the hierarchy, it might have other tags <address>, <document serial number>, etc ... all as sub-items in a document hierarchy.

Map that to RDBMS ... you could have one large table ... with the primary field being the <document serial number>, and an <address> field and a (very large) <document contents> field.

One could characterize such a RDBMS table as have a one level hierarchy ... with a primary field (document serial number) and all the other fields related to the primary field.

In the ML world, the top of the hierarchy would be the actual document (contents) and all the other fields would be related to (or are attributes of) the actual contents.

So one might claim that in a RDBMS world ... the document serial number is the unique thing ... with everything else as attributes of (or having relationship to) the document serial number. In the ML world ... the document could be considered the unique thing ... and everything else (including the document serial number) are attributes/characteristics of the document (lower down in the hierarchy).

some of the confusioin is that the same document might contain both markup tags and attribute tags aka "<br>" ... is a formating, markup tag ... while "<address>" is a data schema tag. So ... a "<br>" embedded in a document isn't likely to be considered as part of the data schema of a document ... while "<address>" may in fact be considered part of the document data schema.

-- 
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
Received on Wed Dec 08 2004 - 23:20:08 CET

Original text of this message