Re: Modeling Data for XML instead of SQL-DBMS

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Fri, 27 Oct 2006 15:39:31 +0200
Message-ID: <45420b5a$0$320$e4fe514c_at_news.xs4all.nl>


dawn wrote:
> mAsterdam wrote:

>> dawn wrote:
>>> mAsterdam wrote:
>>>> dawn wrote:
>>>>> mAsterdam wrote:
>>>>>> ... the logical model is the most complete, detailed level
>>>>>> you can get to /without/ specifying the implementation plan.
>>>>>> I don't think I should unlearn that.
>> [snip (un)muddle]
>>
>>> In order to clarify, my question would be whether the logical model is
>>> data model independent.  The conceptual data model is data model
>>> independent.  The logical data model could be defined as
>>> data-model-dependent or could still be independent of data model
>>> employed by the target DBMS.  My understanding was that it was
>>> data-model dependent and pretty much resembled the implementation data
>>> model (which might be adjusted specific to the toolset used, however).
>>>
>>> So, I guess I'm interested in the very first data model that is no
>>> longer agnostic about the target environment.  Is that or is it not the
>>> logical data model?

>> I may have learned definitions a long time ago, but I never felt the
>> need to fall back on them. I just don't use these terms in a clean,
>> universally correct, ivory tower way.
>> I do try to use appropriate wordings, taking the
>> purpose and context into account.
>>
>> I could use them in a design session like so: 'Hey guys, that's
>> an implementation issue, we 'll deal with that later. For now we have
>> to limit our discussion to the logic of the data itself.', relying
>> on the audiences connotations with 'logic' and 'implementation'.
>>
>> Even your 'very first data model that is no
>> longer agnostic about the target environment'
>> makes me wonder - can there be such a model;
>> data model and non-agnostic about the target environment
>> at the same time? It is not about the data,
>> it is about how to handle the data.

>
> The conceptual data model is the only model that is strictly about the
> data, having nothing to do with how to "handle" the data, right? From
> there we go to "design."

If you are really dealing with multiple points of view, you are forgetting the necessary, non-trivial, integration step. To get from multiple conceptual models to one logical model, elements have to be renamed, definitions have to be re-evaluated. A technique I have seen used for this productively is James Martins 'bubble charting'. BTW JM calls this the canonical data model, but I did not see this term in use.

In the logical model there is strictly no more need for implementation specifics than in the conceptual model. However, the team is closer to cut-over day, so implementation is more on their mind. So in real logical model documents you will see some implementation stuff, but that is noise, not signal.

> there we go to "design." I think of the conceptual model as analysis
> and the logical model as design. I don't really think about any other
> models, with iterations of the logical model until it becomes the
> version of the logical model that gets implemented. I used to think of
> the logical model as the one at the start of that process, but after
> reading a bit, I started to think of it as the one at the end of the
> process (the implementation model). It is the last one that includes
> design related to optimizing for the target environment.

If these different models are defined, they are defined as deliverables in some 'formal' (handbook/official/case-tool) software development process. There is no generally accepted reference process, AFAIK, so I have to rely on those I dealt with. In one official process the requirements for these deliverables (or artifacts if you prefer) were such that the conceptual models allowed for many to many relationships, while in the (integrated) logical model they were forbidden.

> The question I have now is whether the logical data model presupposes a
> family of target DBMS's, such as SQL-DBMS's or Pick DBMS's or whether
> the logical data model is data-model independent. Pascal's definition
> makes it very data-model dependent. That is how I was using the term
> in my original question.

You can't let go, can you. That definition is not "common", AFAIK (and it is not clear).
Do you have another reference?

>> When the implementation is in SQL the schema can be
>> very close to the logical data model so the distinction isn't
>> important most of the time.

>
> That is also how I think of it for other environments, which would make
> sense if the logical data model is related to the target data model.

No. Let's not get into this loop. Build on signal, not on noise.

>> In other environments
>> you can have a physical model, elements of which will have
>> to be thoroughly associated with elements from the logical
>> data model - but I would not call this physical model a
>> data model -

>
> I have no interest in the physical model the way I think of that term
> (other than to have knowledge of it for performace tuning for the final
> implementation model).

When you plan to implement hierarchically, you simply have to decide what data is more important than other data, and you have to decide in which order all data is organized. This is a not yet tool/product specific set of decisions you will have to make - say 'implementation class' specific considerations.

>> I'd call it a storage model if I'd have to classify it.

>
> Yes, agreed. I let the DBMS developers care about the storage model.

You can't if you are using XML.

> The logical model is the one I specify to the DBMS.

No. That is the schema.

> I'll grant that
> with some tools the physical model more closely resembles the logical
> model than in others.

Yes. IOW the schema should reflect the logical model, but how close it is to the logical model varies greatly.

>> If somebody else would call it a physical data-model,
>> I would not interrupt, the message is clear. Calling it a
>> logical model /would/ make me object; it's wrong:
>> the physical model is not /about/ the logic, it presupposes
>> the existence of a logical data model.

>
> Agreed.

Then please stop to redefine the logical data model as implementation specific.

>> Another point:
>> "
>>  >>>> ... the logical model is the most complete, detailed level
>>  >>>> you can get to /without/ specifying the implementation plan.
>> "
>> is not a complete definition (and does not try to be).
>> It is just demarcation of the boundary between logic
>> and implementation.

>
> You can see by the def I posted in the Logical Data Model thread that
> does not align with everything I have read. So if the LDM is about
> logic and does not presuppose a target, then what do you call the data
> model that is specified to the DBMS?

The schema (term used in Codasyl, LDAP, XML, SQL, ...).

> (the one that Pascal refers to as the Logical Model)?

You mean the muddled one you and I both do not understand?

>> The demarcation on the other side,
>> conceptual vs. logical, is more complicated and at
>> the same time it's IMHO less important to have a strict
>> line there, see below.

>
> That is the murky line between capturing requirements (conceptual) and
> designing solutions (logical).

There is more to that (see above).

[snip more logical model definition talk]

>>> How do you define sharing?
>> A try:
>> Sharing data: Use of the same data from more than one point of view.

>
> Well, I am working with "shared data" by this definition -- many points
> of view, but they are all "managed" collectively. There need be no
> assumption that each software entity that is sharing this database does
> so in its own silo, nor that you must permit this sharing by code
> written by developers who do not talk to each other

This limits the team size.

> do shared QA, for example.
> With Codd's "large shared data bank" I think there is
> some assumption that we need to be able to permit people who don't know
> each other to each write code that shares only the database and nothing
> else.

I'd say 'Codd's approach allows for people who ...etc'.

> That would (typically) not be acceptable for the database
> products I'm talking about.

Why not?

> However, the data are still shared among
> multiple apps and 3rd party products.
>
> Trying to get to the bottom of this, I'm working with environments
> where data, code, and developers are all shared with no assumption that
> only the data can be assumed as shared.

This sounds like sales-pitch. You are working with teams where every member is supposed to know everything about all code and data, right?

>> [snip]
>>
>>> OK, I'll review all the feedback and come back with a revised question
>>> (once I have proper definitions for the logical data model and know
>>> precisely for which model we need to know whether persistence will be
>>> handled with UniData or DB2, for example).
>> [snip]
>>
>>>> Say we have a logical model - now we decide to implement using
>>>> hierarchical tools /without/ specifying which one (IMS, Lotus Domino,
>>>> XML, just to name a few alternatives) - now what? Which choices
>>>> do we have to make?
>>> Yes, yes, this is very close to my question.  Conceptual model is
>>> independent of any target environment.  Then there is a logical data
>>> model.  If that is independent of any target environment as well (I
>>> still need a def), then we could have a subsequent question (if you are
>>> as old as me, then you can put it in a diamond shape with the words
>>> "relational model" and a question mark) of whether we are using a
>>> product that implements the relational model or not.  If yes then we
>>> would take the logical model and prepare a relational implementation
>>> model from it, putting data in 1NF, addressing such issues as the SQL
>>> NULL.  If no, then ... (this is where my question is).

>> Dunno about the 1NF/list diamond, but NULLs to me are not only
>> markers for the absence of (a) value, they are also the sign of the
>> absence of sufficient effort put into the logical data model.

>
> In the case where someone takes nulls to be markers for the absense of
> a value, rather than as a value (which is my preference), then I agree.

AAAAaaaRghgrrrmpf. NULLs. Side note: Lots of of people here appear to like to talk about that. If you do, please take over :-)

How did, working from real examples, the NULLs get in? Where did they come from? When did they get in?

> This is definitely significant in a logical data model. If I know that
> for some people we have the predicate <Name> has a marital status of
> <marital status> and for some people we will not have this proposition,
> with my null value (compared to your lack of value) I might
> legitimately model with a Person relation that includes name and
> marital status. With a SQL-DBMS target, I would not do that. So, I
> think this logical model of yours does need to have some knowledge of
> the target in order to be useful.

Could you give some real(ish) examples? How important is it to have the marital status? Do we investigate it when we have to make a decision based on it and it is not known?

[snip]

>> When everybody knows we'll implement in SQL,
>> MVA's do tend to get avoided.

>
> Yes, that was my impression. I have seen what were termed logical data
> models for both SQL-DBMS and PIck target environments and they are
> decidedly different related to multi-valued attributes and nulls (not
> to mention "code files" and other various different design patterns).

Huh? Design patterns? Why bring them in?

>> This is letting implementation
>> guide the logic - strictly a no-no.

>
> I definitely think that is a no-no in the conceptual data model, but
> I'm not sure how helpful a logical data model is without some
> assumption about whether the implementation will be in a product that
> looks like UniData compared to a product that looks like Oracle, for
> example.

There is no difference between conceptual and logical modeling as far as the need to include implementation details is concerned. As a deliverable, logical models cover more detail, and will in practice, mistakenly, contain more implementation specifics.

I think that in order to investigate your question, <rephrase> (correct me if I'm wrong, please) If we have a logical model and we decide to implement using hierarchical tools /without/ specifying which one (IMS, Lotus Domino, XML, LDAP, just to name a few alternatives) now what? Which choices do we have to make? </rephrase>
you will first need to get this distinction right.

While on the one hand you say that you are willing to adopt the terminology I use on this distinction, up to now I only see you trying to blur it. Received on Fri Oct 27 2006 - 15:39:31 CEST

Original text of this message