Re: Modeling Data for XML instead of SQL-DBMS

From: dawn <dawnwolthuis_at_gmail.com>
Date: 27 Oct 2006 08:31:12 -0700
Message-ID: <1161963071.969475.187630_at_m73g2000cwd.googlegroups.com>


mAsterdam wrote:
> dawn wrote:
> > mAsterdam wrote:
> >> dawn wrote:
> >>> mAsterdam wrote:
> >>>> dawn wrote:
> >>>>> mAsterdam wrote:
<snip>
> In the logical model there is strictly no
> more need for implementation specifics than in the conceptual
> model. However, the team is closer to cut-over day, so
> implementation is more on their mind. So in real logical model
> documents you will see some implementation stuff, but that
> is noise, not signal.

OK. Restating this, I think that here you are saying that a designer could create a logical data model without knowing whether the implementation will use a SQL-DBMS, a MUMPs product, a Nelson-Pick product, or XML. Additionally I think you are saying that when the design does know the target, this often influences the logical data model, but that is not a good thing.

> > there we go to "design." I think of the conceptual model as analysis
> > and the logical model as design. I don't really think about any other
> > models, with iterations of the logical model until it becomes the
> > version of the logical model that gets implemented. I used to think of
> > the logical model as the one at the start of that process, but after
> > reading a bit, I started to think of it as the one at the end of the
> > process (the implementation model). It is the last one that includes
> > design related to optimizing for the target environment.
>
> If these different models are defined, they are
> defined as deliverables in some 'formal' (handbook/official/case-tool)
> software development process. There is no generally accepted
> reference process, AFAIK, so I have to rely on those I dealt with.
> In one official process the requirements for these deliverables
> (or artifacts if you prefer) were such that the conceptual models
> allowed for many to many relationships, while in the (integrated)
> logical model they were forbidden.

Why should they be forbidden when they are quite viable logically and can be implemented in some DBMS tools?

> > The question I have now is whether the logical data model presupposes a
> > family of target DBMS's, such as SQL-DBMS's or Pick DBMS's or whether
> > the logical data model is data-model independent. Pascal's definition
> > makes it very data-model dependent. That is how I was using the term
> > in my original question.
>
> You can't let go, can you.

Yes, I'm very eager to let go. You might be aware that I am not Pascal's number one disciple. I just don't want to leap from one definition to not-one. So, I'm trying to nail down your definition since you are providing clues, but not a def I can work with as yet.

On the one hand, it sounds like it is completely independent of the target environment, and then on the other hand you suggest that M-M relationships should be out of the logical model.

> That definition is not
> "common", AFAIK (and it is not clear).

I understand that, so which of the myriad of other ones available comes close from your perspective? I'm just trying to jump from one that I thought was accepted (even if not my original understanding) to another one. Given that I did not grow up on the RM, I cannot rely on my own impressions for these definitions. I would like working definitions or even just descriptions that are workable. Obviously these defs are only helpful if they are target DBMS independent. I don't doubt we can find some workable definitions, but I'm not sure where to find one that works.

> Do you have another reference?

No, but at one point I collected about 50 descriptions or defnitions before I revised my own perspective on the meaning of the term in light of what I read.

> >> When the implementation is in SQL the schema can be
> >> very close to the logical data model so the distinction isn't
> >> important most of the time.
> >
> > That is also how I think of it for other environments, which would make
> > sense if the logical data model is related to the target data model.
>
> No. Let's not get into this loop. Build on signal, not on noise.

Aye, Aye.

> >> In other environments
> >> you can have a physical model, elements of which will have
> >> to be thoroughly associated with elements from the logical
> >> data model - but I would not call this physical model a
> >> data model -
> >
> > I have no interest in the physical model the way I think of that term
> > (other than to have knowledge of it for performace tuning for the final
> > implementation model).
>
> When you plan to implement hierarchically, you simply have to decide
> what data is more important than other data,

>From what I have collected to date regarding best practices for data
modeling when the target is shallow tress on nodes of a di-graph

[I suspect you are using the somewhat inflammatory term "hierarchical" both for its emotional impact and due to the trees on the nodes? It is both inaccurate and maybe even offensive, so I'm hoping that in the future you can avoid that term in this discussion ;-) ]

nothing suggests that there be decisions about what data are more important than others. There might be selections about what portals or entry points users require into the data so those can be beefed up with virtual fields for reporting purposes (for example). It is fair to say that it is a less democratic approach than the RM in that different propositions and different aspects of propositions are modeled differently. Constructing the model is more like constructing the propositions themselves. Some data are entities of interest to users, others are entities more important to "the system," while others are properties of entities (that's at least one way of looking at it).

> and you have to decide in
> which order all data is organized.

In the case of XML that is correct, but not so for all non-SQL-DBMS models. In most the ordering among tuples of data is known only to the DBMS unless sorted for a representation (just as with the RM). In many, the ordering of a tuple in a relation is more like the mathematical definitions of a tuple with the values ordered (a1, a2, ... , an) and names only descriptive of a place in the ordering.

> This is a not yet tool/product
> specific set of decisions you will have to make - say 'implementation
> class' specific considerations.

Yes, and I have been using the term "data model" for that. In other words, you can design for all tools that employ a particular data model, whether Relational, XML, Nelson-Pick, MUMPS, ... Then you tweak for a particular implementation of that model after your more generic design. What is the name of this aspect of design: the model that plays to a particular family of DBMS-products (aka "data model") but not to a specific implementation of that model (aka DBMS).

> >> I'd call it a storage model if I'd have to classify it.
> >
> > Yes, agreed. I let the DBMS developers care about the storage model.
>
> You can't if you are using XML.

I can treat any model as logical. One could model for XML and implement that "schema" in a tool that persists the data using a relational model, right?

> > The logical model is the one I specify to the DBMS.
>
> No. That is the schema.

OK, that works for me. It appears your models are conceptual (plural) to conceptual (single) to logical to schema. Is that accurate?

> > I'll grant that
> > with some tools the physical model more closely resembles the logical
> > model than in others.
>
> Yes. IOW the schema should reflect the logical model,
> but how close it is to the logical model varies greatly.
>
> >> If somebody else would call it a physical data-model,
> >> I would not interrupt, the message is clear. Calling it a
> >> logical model /would/ make me object; it's wrong:
> >> the physical model is not /about/ the logic, it presupposes
> >> the existence of a logical data model.
> >
> > Agreed.
>
> Then please stop to redefine the logical data model as
> implementation specific.

? I'm begging for new definitions that work. I could maybe drum up another one, but I "feel" that it would get the same response. So, I'm happy to try to learn your definition from your descriptions, but I'm having to ask questions to nail it down.

Even if you have to repeat yourself, I would be very appreciative if you could be clear on this question:

Does the logical data model assume any family-of-DBMS-tools as a target (what I have referred to as a "data model"), such as Relational, MUMPS, Nelson-Pick, XML, OR, alternatively, is it independent of the target?

If the latter, which is what I think you are saying, then why are you removing M-M or designing to avoid nulls (see example later).

> >> Another point:
> >> "
> >> >>>> ... the logical model is the most complete, detailed level
> >> >>>> you can get to /without/ specifying the implementation plan.
> >> "
> >> is not a complete definition (and does not try to be).
> >> It is just demarcation of the boundary between logic
> >> and implementation.
> >
> > You can see by the def I posted in the Logical Data Model thread that
> > does not align with everything I have read. So if the LDM is about
> > logic and does not presuppose a target, then what do you call the data
> > model that is specified to the DBMS?
>
> The schema (term used in Codasyl, LDAP, XML, SQL, ...).

Yes, I did catch that above. Thanks, that helps. I think that means that we can take Pascal's def and descriptions of a logical model and replace the term "schema" there and you would be at least roughly in alignment with that.

> > (the one that Pascal refers to as the Logical Model)?
>
> You mean the muddled one you and I both do not understand?

I found a better quotation from that same paper and posted in another thead. It is still muddled, but seemed a bit more clear.

> >> The demarcation on the other side,
> >> conceptual vs. logical, is more complicated and at
> >> the same time it's IMHO less important to have a strict
> >> line there, see below.
> >
> > That is the murky line between capturing requirements (conceptual) and
> > designing solutions (logical).
>
> There is more to that (see above).
>
> [snip more logical model definition talk]
>
> >>> How do you define sharing?
> >> A try:
> >> Sharing data: Use of the same data from more than one point of view.
> >
> > Well, I am working with "shared data" by this definition -- many points
> > of view, but they are all "managed" collectively. There need be no
> > assumption that each software entity that is sharing this database does
> > so in its own silo, nor that you must permit this sharing by code
> > written by developers who do not talk to each other
>
> This limits the team size.

That might be. I have not worked on a team larger than 200 (with a business org structure, of course -- not one big unpartitioned team) all developing for the same DBMS and/or integrating 3rd party tools into it (not all co-located, by the way). But my impression is that there are teams that are much larger (in larger organizations) doing similarly.

> > do shared QA, for example.
> > With Codd's "large shared data bank" I think there is
> > some assumption that we need to be able to permit people who don't know
> > each other to each write code that shares only the database and nothing
> > else.
>
> I'd say 'Codd's approach allows for people who ...etc'.

works for me.

> > That would (typically) not be acceptable for the database
> > products I'm talking about.
>
> Why not?

Do you know the answer and are just testing me? I'll play --

Because the constraints are specified in data (functioning as metadata), rather than being specified in schema. Proprietary libraries are written against these so that there are cross-app components on top of the DBMS, with vertical apps on top of those, instead of vertical silos sitting right on top of the DBMS. Bottom line: Apps share libraries related to puts, gets, and related constraint logic. Some might not like it, but a lot of software has been built and continues to be built with this architecture.

> > However, the data are still shared among
> > multiple apps and 3rd party products.
> >
> > Trying to get to the bottom of this, I'm working with environments
> > where data, code, and developers are all shared with no assumption that
> > only the data can be assumed as shared.
>
> This sounds like sales-pitch.

Unintended, just a personality defect ;-)

> You are working with teams
> where every member is supposed to know everything about
> all code and data, right?

Goodness no, but the control structures, such as configuration management and integration testing, are likely to span all.

> >> [snip]
> >>
> >>> OK, I'll review all the feedback and come back with a revised question
> >>> (once I have proper definitions for the logical data model and know
> >>> precisely for which model we need to know whether persistence will be
> >>> handled with UniData or DB2, for example).
> >> [snip]
> >>
> >>>> Say we have a logical model - now we decide to implement using
> >>>> hierarchical tools /without/ specifying which one (IMS, Lotus Domino,
> >>>> XML, just to name a few alternatives) - now what? Which choices
> >>>> do we have to make?
> >>> Yes, yes, this is very close to my question. Conceptual model is
> >>> independent of any target environment. Then there is a logical data
> >>> model. If that is independent of any target environment as well (I
> >>> still need a def), then we could have a subsequent question (if you are
> >>> as old as me, then you can put it in a diamond shape with the words
> >>> "relational model" and a question mark) of whether we are using a
> >>> product that implements the relational model or not. If yes then we
> >>> would take the logical model and prepare a relational implementation
> >>> model from it, putting data in 1NF, addressing such issues as the SQL
> >>> NULL. If no, then ... (this is where my question is).
>
> >> Dunno about the 1NF/list diamond, but NULLs to me are not only
> >> markers for the absence of (a) value, they are also the sign of the
> >> absence of sufficient effort put into the logical data model.
> >
> > In the case where someone takes nulls to be markers for the absense of
> > a value, rather than as a value (which is my preference), then I agree.
>
> AAAAaaaRghgrrrmpf. NULLs. Side note: Lots of of people here appear to
> like to talk about that. If you do, please take over :-)

I only bring it up as an example of where I see differences in models somewhere after the conceptual model (afraid of using the "logical" term), starting in "high level design" between the project-specific models when the target is an SQL-DBMS than when it is something other. I don't need to get into another bru-ha-ha over that.

> How did, working from real examples, the NULLs get in?
> Where did they come from? When did they get in?

One model related to propositions such as:

Amy is married.
Hal is divorced.
Sylvia is single.
Hope is married.
We do not have enough information to know whether Lily is married or not.

could be a Person entity with attributes that include firstName and maritalStatus.

Another model related to this same set of propositions could be a Person entity that includes the attribute firstName and a PersonMaritalStatus entity that includes the attribute personId (foreign key to Person) and maritalStatus.

> > This is definitely significant in a logical data model. If I know that
> > for some people we have the predicate <Name> has a marital status of
> > <marital status> and for some people we will not have this proposition,
> > with my null value (compared to your lack of value) I might
> > legitimately model with a Person relation that includes name and
> > marital status. With a SQL-DBMS target, I would not do that. So, I
> > think this logical model of yours does need to have some knowledge of
> > the target in order to be useful.
>
> Could you give some real(ish) examples? How important is it to have
> the marital status? Do we investigate it when we have to make a
> decision based on it and it is not known?
>
> [snip]
> >> When everybody knows we'll implement in SQL,
> >> MVA's do tend to get avoided.
> >
> > Yes, that was my impression. I have seen what were termed logical data
> > models for both SQL-DBMS and PIck target environments and they are
> > decidedly different related to multi-valued attributes and nulls (not
> > to mention "code files" and other various different design patterns).
>
> Huh? Design patterns? Why bring them in?
>
> >> This is letting implementation
> >> guide the logic - strictly a no-no.
> >
> > I definitely think that is a no-no in the conceptual data model, but
> > I'm not sure how helpful a logical data model is without some
> > assumption about whether the implementation will be in a product that
> > looks like UniData compared to a product that looks like Oracle, for
> > example.
>
> There is no difference between conceptual and logical modeling as far
> as the need to include implementation details is concerned.
> As a deliverable, logical models cover more detail, and will in
> practice, mistakenly, contain more implementation specifics.

OK, confused again. You indicated that M-M relationships between relations were not in logical data models. There are Books-Authors types of M-M relationships implemented in DBMS's using the Nelson-Pick model. I am really, really trying not to be stubborn on this, just trying to understand.

> I think that in order to investigate your question,
> <rephrase> (correct me if I'm wrong, please)
> If we have a logical model and we decide to implement using
> hierarchical tools /without/ specifying which one
> (IMS, Lotus Domino, XML, LDAP, just to name a few alternatives)
> now what? Which choices do we have to make?
> </rephrase>
> you will first need to get this distinction right.
>
> While on the one hand you say that you are willing
> to adopt the terminology I use on this distinction,
> up to now I only see you trying to blur it.

My questions are asked in order to understand it, not blur it. Sorry so dense and thanks for your patience. I'll get there, but I'm not there yet. There are still seemingly contractions in what you are telling me. I need to understand where in the list of models the designer will need to have an understanding of the target DBMS environment.

attempting to use your terminology

conceptual models --> conceptual model --> HERE --> logical model --> schema

OR

conceptual models -->conceptual model --> logical model --> HERE --> schema

If the first of these, then the marital status logical model would be the one with firstName and maritalStatus as attributes of Person, if the latter, then if deploying in a SQL-DBMS (assuming a no-NULLs policy or best practice), the second would be the model.

Cheers! --dawn Received on Fri Oct 27 2006 - 17:31:12 CEST

Original text of this message