Re: Modeling Data for XML instead of SQL-DBMS

From: dawn <dawnwolthuis_at_gmail.com>
Date: 27 Oct 2006 21:14:30 -0700
Message-ID: <1162008869.945350.185030_at_k70g2000cwa.googlegroups.com>


mAsterdam wrote:
> dawn wrote:
> > mAsterdam wrote:
> [snip]
>
<snip>
> >> Hierarchy (trees of nodes),
> >> is what IMS, documents, file systems, XML, Lotus Domino
> >> and LDAP as data containers have in common.
> >
> > I think of the XML model as a di-graph where there are trees on the
> > nodes, rather than a tree of nodes (it is only an individual document
> > without xlink or external code joining them). Look at each web page
> > that is written with xhtml. It is a tree, but there are links (<a
> > href...>) from one page to another that form it into a di-graph.
> > MV/Pick files are somewhat similar.
>
> A unix filesystem is - at shell level - a tree.

Yes, while the DBMS tools of which I am aware do not employ a tree at the "top level" in their model. A set of XML documents may include links (one way or another) within or between them (think of XHTML). Given a set of XML documents, there is no single "node" that is "the root", even though each document has a root and can be seen as a tree.

So, the model for persisting data in XML documents seems like it would be a digraph with nodes that are trees, although someone else might have a better description or understanding of it.

I am not an XML expert, it is just something that most readers would understand that does not employ a relational model, which is why I chose it for the question, when my real question is broader, encompassing all non-SQL-DBMS, not-really-relational models.

<snip>
> > and when it is only on the nodes that the trees are
> > found, then I think it should be called by the higher level model, e.g.
> > di-graph.
>
> What makes that higher-level? On which stairway?

Maybe I should have said "Outer" instead of higher? Collect the XML docs, perhaps one on each node (or drop down below the document level and put one of each second-to-the-root level elements on each node), then make arrows with the links between them. This overall structure is not a hierarchical structure even though each node in the digraph is a tree.

Let me know if I am making sense to you or not with this description. I'm sure that real computer scientists (I'm an old DP girl) could correct me on any of this.

> >>> nothing suggests that there be decisions about what data are more
> >>> important than others.
> >> What is going to be your top level?
> >
> > OK, so you really do think it is hierarchical at the top level. There
> > is no "top level" in a di-graph.
>
> So now suddenly the topic
> is "Modeling Data for di-graphs instead of SQL-DBMS"?
> A change of topic as an answer to an on topic question.
> Please lose that style.

I'm still on "my" topic, even if I moved it to the more abstract question I have. Clue me in on where I deviated from the topic from your perspective.

> Documents, and consequently XML are not only hierarchical
> at the top level.
> The question "What is going to be your top level?"
> is one of the issues to resolve when implementing
> in a tree, it is part of the answer to your OP.

Right, so I'm not implementing a tree. I'm implementing a digraph with trees on the nodes (again, that is the picture in my head, but I'm not infallible, so tell me how you would use XML as logically a strictly hierarchical data store). A layer below the logical level might be strictly a tree (higher than the physical on the disk level).

>
> [snip]
> > When forming sentences, you choose a subject, for example. That
> > subject has its place in the structure of a sentence. So, a
> > proposition like "Sarah has brown eyes" which has two nouns in it, has
> > only one of these as the subject. If doing some
> > dinner-napkin-modeling, one person might sketch a Person (the subject
> > in our proposition) with attributes of name and eyeColor. Another
> > person might suggest that we have two entities here, a Person entity
> > with an identity-ish attribute of name and an Eye entity, with an
> > attribute of color.
>
> A transplant team member, for instance.

Yup, you've played this game before ;-)

> > The first person is using a non-democratic approach, treating one of
> > these nouns as "the subject" and recognizing a "has-a" relationship
> > that could permit the object of the "has" verb to be in the collection
> > of properties for the main entity, the Person.
> >
> [snip]
>
> >>>> and you have to decide in
> >>>> which order all data is organized.
> >>> In the case of XML that is correct, but not so for all non-SQL-DBMS
> >>> models.
> >> That /is/ the topic, no?
> >
> > Ordering? No, that is not my topic.
>
> Your remark "In the case of XML that is correct, but not so for all
> non-SQL-DBMS models" is strange, when the topic you chose is
> "Modeling Data for XML instead of SQL-DBMS".

Yes, when holding a dialog with someone I know can abstract it further, I took the liberty to do so, since my real question is about modeling data for use with any-persistence-tools-other-than-SQL-DBMS's, XML being on that most here have seen before.

<snip>
> > I'm actually talking about any DBMS tools that employ 2VL and do not
> > insist on the form formerly known as 1NF. Just as David brings up the
> > DEC Rdb, I bring in tools I know as I'm working on this.
>
> You initiated the thread.
> IMO the topic is very broad already. In the OP it was very fuzzy as
> well. Drifting further away just isn't going to help anything.

I just went to my real question, an abstraction of my OP. My experience with XML is much more limited than with non-SQL-DBMS tools, but questions framed either in the abstract or about data models with which I am more familiar are not as productive for me to learn what I am trying to learn.

> >>>>>> I'd call it a storage model if I'd have to classify it.
> >>>>> Yes, agreed. I let the DBMS developers care about the storage model.
> >>>> You can't if you are using XML.
> >>> I can treat any model as logical. One could model for XML and
> >>> implement that "schema" in a tool that persists the data using a
> >>> relational model, right?
> >> Yes, but your development team (including db-guys) will have
> >> to decide how exactly. DBMS's provide features, but the design
> >> decides how they are used.
> >
> > I'm not sure I understand. As long as the development team knows how
> > they are working with data (using an XML model or an SQL model, for
> > example), they don't have to care what the products they are working
> > with physically model the data. I do understand that it is important
> > for design to have some understanding of the underlying physical model.
> > For example, how can a database best be partitioned, indexed, and even
> > designed for performance.
>
> What about the hierarchy, how is it shaped - first come first served?
>
> [snip schema]
>
> >> The m:n removal was simply a formal requirement in that
> >> particular process. I commented on that above.
> >
> > OK. So, that was improper to have that formal requirement because it
> > was (likely) based on the target enviroment, right?
>
> Improper if their goal would have been to achieve purity, yes.
> It wasn't and there were no candidate implementations with
> anonymous m:n relationships.
> It was a proper requirement for their purpose.
>
> [snip]
> >>>> How did, working from real examples, the NULLs get in?
> >>>> Where did they come from? When did they get in?
> >>> One model related to propositions such as:
> >>>
> >>> Amy is married.
> >>> Hal is divorced.
> >>> Sylvia is single.
> >>> Hope is married.
> >>> We do not have enough information to know whether Lily is married or
> >>> not.
> >> The last one is strange mix of non-fact and meta-information.
>
> > It is just one of many propositions we need to model with this example.
>
> Then answer the earlier question: do we need to know whether Lily is
> married or not?

No

> Why?

Because our data collection mechanism does not require that any marital status informaiton be collected.

> What are we going to do if we still don't know it at a time we need to
> make a decision based on her marital status - are we going to investigate?

No, we are going to include the option of not-knowing as one of the alternatives for decision-making.

> >> Is it stating that we need to know Lily's marital status?
>
> > Nope, I don't read that in this set of propositions.
> You provided the propositions,
> you know what they are supposed to mean.

OK, I played end-user above and answered.

> >> If so, why? What are the requirements?
>
> > To model these propositions ;-)
> >
> >> Here is another breakdown.
> >>
> >> Amy is a person.
> >> Amy is married.
> >> Hal is a person.
> >> Hal is divorced.
> >> Sylvia is a person.
> >> Sylvia is single.
> >> Hope is a person.
> >> Hope is married.
> >> Lily is a person.
> >
> > Sounds like Neo ;-(
> >
> >> We still do not have enough information to know whether Lily is
> >> married or not, but no strange proposition stating that as a non-fact.
> >
> > Interesting point. I might propose that depending on the likely target
> > environment, the requirements from an analysis phase area actually
> > different.
>
> I submit that your set contains implementation
> consideration, that it is not pure fact.

I don't see that, since it was the information that users gave me and they didn't know the target environment. But I might accept your statement too since a) I don't know what "pure fact" is and b) I do think that even in conceptual modeling I would end up with a different model if I thought I was headed to a SQL DBMS than if I thought I was headed for something else (I might try to talk users out of seemingly less-important multivalues in the latter case, for example -- you could work with collecting just one phone number and one e-mail address for each person, right?)

> > My users think there is nothing strange about the
> > proposition set I listed.
> > Your users might be happy with your list
> > too, but I bet they would suggest mine aligns more with the way they
> > are thinking about it.
>
> You should have had them do the listing.
> Your solution familiarity colors and clouds
> your problem observation skills.

No doubt. smiles. --dawn Received on Sat Oct 28 2006 - 06:14:30 CEST

Original text of this message