Re: Modeling Data for XML instead of SQL-DBMS

From: dawn <dawnwolthuis_at_gmail.com>
Date: 27 Oct 2006 14:56:39 -0700
Message-ID: <1161986199.456375.227000_at_i42g2000cwa.googlegroups.com>


mAsterdam wrote:
> dawn wrote:
> > mAsterdam wrote:
> >> dawn wrote:
> >>> mAsterdam wrote:
> >>>> dawn wrote:
> >>>>> mAsterdam wrote:
> >>>>>> dawn wrote:
> >>>>>>> mAsterdam wrote:
> > <snip>

> > So, I'm trying to nail down your definition
> > since you are providing clues, but not a def I can work with as yet.
>
> Hey! I would provide a definition if I could.

Yes, I understand that, sorry if my words implied otherwise.

> However, I already stated that a definition
> (if it proves to difficult to find one) may not be
> needed for your scoping and disambiguation needs.
> Careful description of the boundaries may be sufficient,

Yes, agreed.

> in this case logic of data vs. implementation.

>

> > On the one hand, it sounds like it is completely independent of the
> > target environment, and then on the other hand you suggest that M-M
> > relationships should be out of the logical model.
>

> This particular logical model deliverable had as its requirement
> that there should be no many-to-many relationship. I am not
> saying logical models in general shouldn't have them.

Ah. This logical model you are referring to is specific to the pizza example or what?

> I could
> imagine another formal process were the (or a) logical model
> deliverable does have many to many relationships.

OK, good. What are the guidelines for whether a logical data model may include many to many relationships?

> You appear to be looking for the universally accepted
> software development process. There is none.

I am trying to understand the terms so that I can ask my question in a way that the terminology does not distract from the question. Additionally, I would like to get my original question answered (perhaps not until I clean up the terminology).

> Handbook/official/case-tool software development
> processes /do/ differ.

Indeed they do.

[Aside. I've done my time as a methodology person, both as project leader for developing and deploying a new metholodogy in the 80's and then on a standing methodology committee at another company in the 90's. Ask me if I lean more towards the CMM or XP ;-) ]

>Hierarchy (trees of nodes),

> is what IMS, documents, file systems, XML, Lotus Domino
> and LDAP as data containers have in common.

I think of the XML model as a di-graph where there are trees on the nodes, rather than a tree of nodes (it is only an individual document without xlink or external code joining them). Look at each web page that is written with xhtml. It is a tree, but there are links (<a href...>) from one page to another that form it into a di-graph. MV/Pick files are somewhat similar.

> ISTM that hierarchy is a key concept in finding out
> what deliberations concerning the organization of data
> are necessary without the need to get into the characteristics
> of the specific products.

>

> What is inaccurate about that?
> What is offensive about that?
> How does it impact emotionally?

The term "hierarchical database" has been used to dismiss products out of hand, simply with a label. Tree structures are used throughout computer science and information systems, but when the terms "hierarchical" and "data" are used together, it has an adverse affect, I think. But nevermind, not important, I just prefer "tree structure" to hierarchical and when it is only on the nodes that the trees are found, then I think it should be called by the higher level model, e.g. di-graph.

> > nothing suggests that there be decisions about what data are more
> > important than others.
>
> What is going to be your top level?

OK, so you really do think it is hierarchical at the top level. There is no "top level" in a di-graph.

> > There might be selections about what portals or
> > entry points users require into the data so those can be beefed up with
> > virtual fields for reporting purposes (for example). It is fair to say
> > that it is a less democratic approach than the RM in that different
> > propositions and different aspects of propositions are modeled
> > differently.

>

> ?
>

> > Constructing the model is more like constructing the
> > propositions themselves.
>
> I don't understand this, either.

My thinking on this has not yet been shaped into words, but I'll give it a quick spin just to try to give an impression of what I mean, not something really crisp.

When forming sentences, you choose a subject, for example. That subject has its place in the structure of a sentence. So, a proposition like "Sarah has brown eyes" which has two nouns in it, has only one of these as the subject. If doing some dinner-napkin-modeling, one person might sketch a Person (the subject in our proposition) with attributes of name and eyeColor. Another person might suggest that we have two entities here, a Person entity with an identity-ish attribute of name and an Eye entity, with an attribute of color.

The first person is using a non-democratic approach, treating one of these nouns as "the subject" and recognizing a "has-a" relationship that could permit the object of the "has" verb to be in the collection of properties for the main entity, the Person.

Don't jump all over this, it is not well-thought-out, just an impression at this point.

> > Some data are entities of interest to users,
> > others are entities more important to "the system," while others are
> > properties of entities (that's at least one way of looking at it).
>
> I am lost.

That's OK.

>

> >> and you have to decide in
> >> which order all data is organized.
> >
> > In the case of XML that is correct, but not so for all non-SQL-DBMS
> > models.
>
> That /is/ the topic, no?

Ordering? No, that is not my topic.

> > In most the ordering among tuples of data is known only to the
> > DBMS unless sorted for a representation (just as with the RM). In
> > many, the ordering of a tuple in a relation is more like the
> > mathematical definitions of a tuple with the values ordered (a1, a2,
> > ... , an) and names only descriptive of a place in the ordering.

>

> >> This is a not yet tool/product
> >> specific set of decisions you will have to make - say 'implementation
> >> class' specific considerations.
> >
> > Yes, and I have been using the term "data model" for that. In other
> > words, you can design for all tools that employ a particular data
> > model, whether Relational, XML, Nelson-Pick, MUMPS, ... Then you tweak
> > for a particular implementation of that model after your more generic
> > design. What is the name of this aspect of design: the model that
> > plays to a particular family of DBMS-products (aka "data model") but
> > not to a specific implementation of that model (aka DBMS).
>

> Ah, its not - you want to talk about Pick.
> That is not the topic you advertised.

I'm actually talking about any DBMS tools that employ 2VL and do not insist on the form formerly known as 1NF. Just as David brings up the DEC Rdb, I bring in tools I know as I'm working on this.

> >>>> I'd call it a storage model if I'd have to classify it.
> >>> Yes, agreed. I let the DBMS developers care about the storage model.
> >> You can't if you are using XML.
> >
> > I can treat any model as logical. One could model for XML and
> > implement that "schema" in a tool that persists the data using a
> > relational model, right?

>

> Yes, but your development team (including db-guys) will have
> to decide how exactly. DBMS's provide features, but the design
> decides how they are used.

I'm not sure I understand. As long as the development team knows how they are working with data (using an XML model or an SQL model, for example), they don't have to care what the products they are working with physically model the data. I do understand that it is important for design to have some understanding of the underlying physical model.  For example, how can a database best be partitioned, indexed, and even designed for performance.

> >>> The logical model is the one I specify to the DBMS.
> >> No. That is the schema.
> >
> > OK, that works for me. It appears your models are conceptual (plural)
> > to conceptual (single) to logical to schema. Is that accurate?
>
> Yes.

<snip>
> > Does the logical data model assume any family-of-DBMS-tools as a target
> > (what I have referred to as a "data model"), such as Relational, MUMPS,
> > Nelson-Pick, XML, OR, alternatively, is it independent of the target?

>

> I would appreciate if others would chime in here;
> this is just my opinion:
>

> In my book (as a manner of speaking - I never wrote one) logical
> data models are independent of the implementation target, even
> family-of-DBMS-tools. As far as a delivered 'logical model'
> document contains implementation considerations, they are
> out of scope addenda, however useful they may be.
>
> Clear enough?

Yes.

> > If the latter, which is what I think you are saying, then why are you
> > removing M-M or designing to avoid nulls (see example later).

>

> The m:n removal was simply a formal requirement in that
> particular process. I commented on that above.

OK. So, that was improper to have that formal requirement because it was (likely) based on the target enviroment, right?

<snip>
> >>> do shared QA, for example.
> >>> With Codd's "large shared data bank" I think there is
> >>> some assumption that we need to be able to permit people who don't know
> >>> each other to each write code that shares only the database and nothing
> >>> else.
> >> I'd say 'Codd's approach allows for people who ...etc'.
> >
> > works for me.
> >
> >>> That would (typically) not be acceptable for the database
> >>> products I'm talking about.
> >> Why not?
> >
> > Do you know the answer and are just testing me? I'll play --

>

> Heh. Pleading not guilty - the 'not be acceptable'
> just sounded strange to me.
>
>

> > Because the constraints are specified in data (functioning as
> > metadata), rather than being specified in schema. Proprietary
> > libraries are written against these so that there are cross-app
> > components on top of the DBMS, with vertical apps on top of those,
> > instead of vertical silos sitting right on top of the DBMS. Bottom
> > line: Apps share libraries related to puts, gets, and related
> > constraint logic. Some might not like it, but a lot of software has
> > been built and continues to be built with this architecture.
>

> Yes. The relevance to database theory seems to be:
> life is possible without it :-)

I don't think of this as life without database theory. Functional dependencies, for example, are surely relevant. Is that database theory?

> [snip]
> >> AAAAaaaRghgrrrmpf. NULLs. Side note: Lots of of people here appear to
> >> like to talk about that. If you do, please take over :-)
> >
> > I only bring it up as an example of where I see differences in models
> > somewhere after the conceptual model (afraid of using the "logical"
> > term), starting in "high level design" between the project-specific
> > models when the target is an SQL-DBMS than when it is something other.
> > I don't need to get into another bru-ha-ha over that.

>

> I know, I know, it's relevant.
> Does not make me like the topic, though.
>

> >> How did, working from real examples, the NULLs get in?
> >> Where did they come from? When did they get in?
> >
> > One model related to propositions such as:
> >
> > Amy is married.
> > Hal is divorced.
> > Sylvia is single.
> > Hope is married.
> > We do not have enough information to know whether Lily is married or
> > not.
>
> The last one is strange mix of non-fact and meta-information.

It is just one of many propositions we need to model with this example.

> Is it stating that we need to know Lily's marital status?

Nope, I don't read that in this set of propositions.

> If so, why? What are the requirements?

To model these propositions ;-)

> Here is another breakdown.

>

> Amy is a person.
> Amy is married.
> Hal is a person.
> Hal is divorced.
> Sylvia is a person.
> Sylvia is single.
> Hope is a person.
> Hope is married.
> Lily is a person.

Sounds like Neo ;-(

> We still do not have enough information to know whether Lily is
> married or not, but no strange proposition stating that as a non-fact.

Interesting point. I might propose that depending on the likely target environment, the requirements from an analysis phase area actually different. My users think there is nothing strange about the proposition set I listed. Your users might be happy with your list too, but I bet they would suggest mine aligns more with the way they are thinking about it. I'll stop here for now. --dawn Received on Fri Oct 27 2006 - 23:56:39 CEST

Original text of this message