Re: Modeling Data for XML instead of SQL-DBMS
Date: 28 Oct 2006 17:32:02 -0700
Message-ID: <1162081922.016958.251670_at_b28g2000cwb.googlegroups.com>
mAsterdam wrote:
> dawn wrote:
> > mAsterdam wrote:
> >> dawn wrote:
> >>> mAsterdam wrote:
> >>>> Hierarchy (trees of nodes),
> >>>> is what IMS, documents, file systems, XML, Lotus Domino
> >>>> and LDAP as data containers have in common.
>
> >>> I think of the XML model as a di-graph where there are trees on the
> >>> nodes, rather than a tree of nodes (it is only an individual document
> >>> without xlink or external code joining them). Look at each web page
> >>> that is written with xhtml. It is a tree, but there are links (<a
> >>> href...>) from one page to another that form it into a di-graph.
> >>> MV/Pick files are somewhat similar.
> >> A unix filesystem is - at shell level - a tree.
> >
> > Yes, while the DBMS tools of which I am aware do not employ a tree at
> > the "top level" in their model. A set of XML documents may include
> > links (one way or another) within or between them (think of XHTML).
> > Given a set of XML documents, there is no single "node" that is "the
> > root", even though each document has a root and can be seen as a tree.
>
> You are still/again mixing data and implementation :-(
> I'll try to spell it, though we already did go into this in
> quite some detail a year or so ago.
Yes, I recall, so we might be on a thread where we need to agree to disagree on some aspect of this.
> The links are not (logical/user/real) data
They are identifiers.
I think of these other specified links (e.g. in xml schema, although I might be too ignorant in that area) as metadata, whether specified in some separate metadata repository or specified in a metadata portion of the database itself (as rules are). For the purpose of identifying the mathematical form of the "data model," I don't care how these links are specified. In SQL, similar metadata would be in the JOIN clauses of SELECTS in CREATE VIEW statements.
> (except in the meta case
> of-course, e.g. content-management systems - but let's keep
> it simple).
Yes, we can ignore content mgmt for this discussion.
> The links are not (logical/user/real) data.
> They are not. They are really really really not data.
I look at the abstract, logical model and you seem to be looking at the implementation. Because the XML-web (XHTML, XML accessed via http) forms a highly-distributed data repository, I'll grant you your definitions regarding it. But a very analogous concept is in standard DBMS tools (non-SQL-based) where the link specifications are metadata.
> They do not even
> reference data.
> They are locators, pointing to a location,
I recall discussions about pointers and the general opinion I was left was is that foreign key specifications are not typically called pointers. The term "pointer" is more often used to store memory locations and "stuff" at a lower level than the metadata.
Working strictly with HTML pages and the <a href ...> links, the link value is the foreign key value for the "tuple" (HTML document) where you can get more information about the linked from value. If that were a "pointer" using what I was understanding as the term, then it would surely at least specify a particular piece of hardware, where a URL does not -- it is just the ID value for that node. It is used by the logical system to eventually find the right memory location, just as a join specification is used.
> also called
> pointers.
If you wish to call a URL value a pointer, you may do so. You may also certainly define "pointer" to include foreign key specifications if you wish. I have been trying to avoid the term pointer when talking about the logical data model. One could implement a "web data model" in a variety of ways, including with a relational data store. The data model itself is at a higher level than the implementation and need not care about how it is implemented under the covers (other than for various tweaking for performance and such).
> Because they are not data, they are not part of the whole of the data.
Data, metadata, pointer, whatever. They are part of the mix. Hopefully we can agree on that.
> Aren't they structural elements then?
> Yes, they are part of the web (no tree) of documents and parts of
> documents. But they are not data-elements.
Fine. They are identifiers, the identity value for a document. Call it what you will.
> To get back to the OP-question, the (logical/user/real) data
> still has to be placed somewhere in the hierarchical
> (tree, not web) document structure.
If we were to create a data repository using XML documents for a book-author system, we could put books in one or more XML documents and authors similarly. When abstracting it to the data model, I would include these two top-level "entities" (they each get a UML rectangle on a class diagram, for example). The name space could be seen as a top level to the metadata, but I don't think that is the hierarchy you are talking about. I do not have to decide whether Books are higher or lower than Authors in any hierarchy, nor does the Book data need to have any data "above it" in some hierarchy (even if the documents have root nodes). Are you suggesting that these data must be seen only in terms of a strict tree structure? I'm not catching on yet.
> This is a consequential choice you have to make,
> because of the hierarchical nature of the implementation.
> Characteristics of the (logical/user/real) data
> should/could provide guidance for this designing of the hierarchy.
> I am not aware of a systematic treatment of this and similar choices,
> though it is made - probably mostly implicitly - every day.
>
> [snip]
Thanks. --dawn Received on Sun Oct 29 2006 - 02:32:02 CEST