Re: Modeling Data for XML instead of SQL-DBMS

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Sat, 28 Oct 2006 01:59:19 +0200
Message-ID: <45429c9c$0$338$e4fe514c_at_news.xs4all.nl>


dawn wrote:
> mAsterdam wrote:

[snip]

>> I could
>> imagine another formal process were the (or a) logical model
>> deliverable does have many to many relationships.

>
> OK, good. What are the guidelines for whether a logical data model may
> include many to many relationships?

You appear to be looking for the universally accepted software development process. There is none.

Hey! I just said that :-)

>> You appear to be looking for the universally accepted
>> software development process. There is none.

>
> I am trying to understand the terms so that I can ask my question in a
> way that the terminology does not distract from the question.
> Additionally, I would like to get my original question answered
> (perhaps not until I clean up the terminology).

I already gave a rephrase you welcomed and gave an answer. A rephrase by you should help, though.

>> Handbook/official/case-tool software development
>> processes /do/ differ.

>
> Indeed they do.
>
> [Aside. I've done my time as a methodology person, both as project
> leader for developing and deploying a new metholodogy in the 80's and
> then on a standing methodology committee at another company in the
> 90's. Ask me if I lean more towards the CMM or XP ;-) ]

[ Did you look at DSDM? ]

>> Hierarchy (trees of nodes),
>> is what IMS, documents, file systems, XML, Lotus Domino
>> and LDAP as data containers have in common.

>
> I think of the XML model as a di-graph where there are trees on the
> nodes, rather than a tree of nodes (it is only an individual document
> without xlink or external code joining them). Look at each web page
> that is written with xhtml. It is a tree, but there are links (<a
> href...>) from one page to another that form it into a di-graph.
> MV/Pick files are somewhat similar.

A unix filesystem is - at shell level - a tree. Links (hard and symbolic) break the strict tree-ness, speaking in graphs it is not a tree anymore.
In designing a directory-structure, the tree-ness is still dominant. Same for documents and XML. I don't know whether such links-like things are present in the other tools I mentioned.

>> ISTM that hierarchy is a key concept in finding out
>> what deliberations concerning the organization of data
>> are necessary without the need to get into the characteristics
>> of the specific products.
>>
>> What is inaccurate about that?
>> What is offensive about that?
>> How does it impact emotionally?

>
> The term "hierarchical database" has been used to dismiss products out
> of hand, simply with a label.

Shit happens. It don't use the term to dismiss products (aside: I don't have that power, I couldn't if I'd want to). I use it to denote hierarchies.

> Tree structures are used throughout
> computer science and information systems, but when the terms
> "hierarchical" and "data" are used together, it has an adverse affect,
> I think. But nevermind, not important, I just prefer "tree structure"
> to hierarchical

I'll stick to using both. There are a lot hierarchies of data around.

> and when it is only on the nodes that the trees are
> found, then I think it should be called by the higher level model, e.g.
> di-graph.

What makes that higher-level? On which stairway?

>>> nothing suggests that there be decisions about what data are more
>>> important than others.
>> What is going to be your top level?

>
> OK, so you really do think it is hierarchical at the top level. There
> is no "top level" in a di-graph.

So now suddenly the topic
is "Modeling Data for di-graphs instead of SQL-DBMS"? A change of topic as an answer to an on topic question. Please lose that style.

Documents, and consequently XML are not only hierarchical at the top level.
The question "What is going to be your top level?" is one of the issues to resolve when implementing
in a tree, it is part of the answer to your OP.

[snip]
> When forming sentences, you choose a subject, for example. That
> subject has its place in the structure of a sentence. So, a
> proposition like "Sarah has brown eyes" which has two nouns in it, has
> only one of these as the subject. If doing some
> dinner-napkin-modeling, one person might sketch a Person (the subject
> in our proposition) with attributes of name and eyeColor. Another
> person might suggest that we have two entities here, a Person entity
> with an identity-ish attribute of name and an Eye entity, with an
> attribute of color.

A transplant team member, for instance.

> The first person is using a non-democratic approach, treating one of
> these nouns as "the subject" and recognizing a "has-a" relationship
> that could permit the object of the "has" verb to be in the collection
> of properties for the main entity, the Person.
>
[snip]

>>>> and you have to decide in
>>>> which order all data is organized.
>>> In the case of XML that is correct, but not so for all non-SQL-DBMS
>>> models.
>> That /is/ the topic, no?

>
> Ordering? No, that is not my topic.

Your remark "In the case of XML that is correct, but not so for all non-SQL-DBMS models" is strange, when the topic you chose is "Modeling Data for XML instead of SQL-DBMS".

>>> In most the ordering among tuples of data is known only to the
>>> DBMS unless sorted for a representation (just as with the RM).  In
>>> many, the ordering of a tuple in a relation is more like the
>>> mathematical definitions of a tuple with the values ordered (a1, a2,
>>> ... , an) and names only descriptive of a place in the ordering.
>>>> This is a not yet tool/product
>>>> specific set of decisions you will have to make - say 'implementation
>>>> class' specific considerations.
>>> Yes, and I have been using the term "data model" for that.  In other
>>> words, you can design for all tools that employ a particular data
>>> model, whether Relational, XML, Nelson-Pick, MUMPS, ...  Then you tweak
>>> for a particular implementation of that model after your more generic
>>> design.  What is the name of this aspect of design: the model that
>>> plays to a particular family of DBMS-products (aka "data model") but
>>> not to a specific implementation of that model (aka DBMS).
>> Ah, its not - you want to talk about Pick.
>> That is not the topic you advertised.

>
> I'm actually talking about any DBMS tools that employ 2VL and do not
> insist on the form formerly known as 1NF. Just as David brings up the
> DEC Rdb, I bring in tools I know as I'm working on this.

You initiated the thread.
IMO the topic is very broad already. In the OP it was very fuzzy as well. Drifting further away just isn't going to help anything.

>>>>>> I'd call it a storage model if I'd have to classify it.
>>>>> Yes, agreed.  I let the DBMS developers care about the storage model.
>>>> You can't if you are using XML.
>>> I can treat any model as logical.  One could model for XML and
>>> implement that "schema" in a tool that persists the data using a
>>> relational model, right?
>> Yes, but your development team (including db-guys) will have
>> to decide how exactly. DBMS's provide features, but the design
>> decides how they are used.

>
> I'm not sure I understand. As long as the development team knows how
> they are working with data (using an XML model or an SQL model, for
> example), they don't have to care what the products they are working
> with physically model the data. I do understand that it is important
> for design to have some understanding of the underlying physical model.
> For example, how can a database best be partitioned, indexed, and even
> designed for performance.

What about the hierarchy, how is it shaped - first come first served?

[snip schema]

>> The m:n removal was simply a formal requirement in that
>> particular process. I commented on that above.

>
> OK. So, that was improper to have that formal requirement because it
> was (likely) based on the target enviroment, right?

Improper if their goal would have been to achieve purity, yes. It wasn't and there were no candidate implementations with anonymous m:n relationships.
It was a proper requirement for their purpose.

[snip]

>>>> How did, working from real examples, the NULLs get in?
>>>> Where did they come from? When did they get in?
>>> One model related to propositions such as:
>>>
>>> Amy is married.
>>> Hal is divorced.
>>> Sylvia is single.
>>> Hope is married.
>>> We do not have enough information to know whether Lily is married or
>>> not.
>> The last one is strange mix of non-fact and meta-information.

> It is just one of many propositions we need to model with this example.

Then answer the earlier question: do we need to know whether Lily is married or not? Why?
What are we going to do if we still don't know it at a time we need to make a decision based on her marital status - are we going to investigate?

>> Is it stating that we need to know Lily's marital status?

> Nope, I don't read that in this set of propositions.
You provided the propositions,
you know what they are supposed to mean.

>> If so, why? What are the requirements?

> To model these propositions ;-)
>

>> Here is another breakdown.
>>
>> Amy is a person.
>> Amy is married.
>> Hal is a person.
>> Hal is divorced.
>> Sylvia is a person.
>> Sylvia is single.
>> Hope is a person.
>> Hope is married.
>> Lily is a person.

>
> Sounds like Neo ;-(
>
>> We still do not have enough information to know whether Lily is
>> married or not, but no strange proposition stating that as a non-fact.

>
> Interesting point. I might propose that depending on the likely target
> environment, the requirements from an analysis phase area actually
> different.

I submit that your set contains implementation consideration, that it is not pure fact.

> My users think there is nothing strange about the
> proposition set I listed.
> Your users might be happy with your list
> too, but I bet they would suggest mine aligns more with the way they
> are thinking about it.

You should have had them do the listing. Your solution familiarity colors and clouds your problem observation skills. Received on Sat Oct 28 2006 - 01:59:19 CEST

Original text of this message