Re: Modeling Data for XML instead of SQL-DBMS

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Fri, 27 Oct 2006 13:34:17 +0200
Message-ID: <4541ee0d$0$321$e4fe514c_at_news.xs4all.nl>


dawn wrote:
> mAsterdam wrote:

>> dawn wrote:
>>> mAsterdam wrote:
>>>> dawn wrote:
>>>>> mAsterdam wrote:
>>>>>> ... the logical model is the most complete, detailed level
>>>>>> you can get to /without/ specifying the implementation plan.
>>>>>> I don't think I should unlearn that.
>> [snip (un)muddle]
>>
>>> In order to clarify, my question would be whether the logical model is
>>> data model independent.  The conceptual data model is data model
>>> independent.  The logical data model could be defined as
>>> data-model-dependent or could still be independent of data model
>>> employed by the target DBMS.  My understanding was that it was
>>> data-model dependent and pretty much resembled the implementation data
>>> model (which might be adjusted specific to the toolset used, however).
>>>
>>> So, I guess I'm interested in the very first data model that is no
>>> longer agnostic about the target environment.  Is that or is it not the
>>> logical data model?
>> I may have learned definitions a long time ago, but I never felt the
>> need to fall back on them. I just don't use these terms in a clean,
>> universally correct, ivory tower way.
>> I do try to use appropriate wordings, taking the
>> purpose and context into account.
>>
>> I could use them in a design session like so: 'Hey guys, that's
>> an implementation issue, we 'll deal with that later. For now we have
>> to limit our discussion to the logic of the data itself.', relying
>> on the audiences connotations with 'logic' and 'implementation'.
>>
>> Even your 'very first data model that is no
>> longer agnostic about the target environment'
>> makes me wonder - can there be such a model;
>> data model and non-agnostic about the target environment
>> at the same time? It is not about the data,
>> it is about how to handle the data.

>
> The conceptual data model is the only model that is strictly about the
> data, having nothing to do with how to "handle" the data, right? From
> there we go to "design." I think of the conceptual model as analysis
> and the logical model as design. I don't really think about any other
> models, with iterations of the logical model until it becomes the
> version of the logical model that gets implemented. I used to think of
> the logical model as the one at the start of that process, but after
> reading a bit, I started to think of it as the one at the end of the
> process (the implementation model). It is the last one that includes
> design related to optimizing for the target environment.
>
> The question I have now is whether the logical data model presupposes a
> family of target DBMS's, such as SQL-DBMS's or Pick DBMS's or whether
> the logical data model is data-model independent. Pascal's definition
> makes it very data-model dependent. That is how I was using the term
> in my original question.
>
>> When the implementation is in SQL the schema can be
>> very close to the logical data model so the distinction isn't
>> important most of the time.

>
> That is also how I think of it for other environments, which would make
> sense if the logical data model is related to the target data model.
>
>> In other environments
>> you can have a physical model, elements of which will have
>> to be thoroughly associated with elements from the logical
>> data model - but I would not call this physical model a
>> data model -

>
> I have no interest in the physical model the way I think of that term
> (other than to have knowledge of it for performace tuning for the final
> implementation model).
>
>> I'd call it a storage model if I'd have to classify it.

>
> Yes, agreed. I let the DBMS developers care about the storage model.
> The logical model is the one I specify to the DBMS. I'll grant that
> with some tools the physical model more closely resembles the logical
> model than in others.
>
>> If somebody else would call it a physical data-model,
>> I would not interrupt, the message is clear. Calling it a
>> logical model /would/ make me object; it's wrong:
>> the physical model is not /about/ the logic, it presupposes
>> the existence of a logical data model.

>
> Agreed.
>
>> Another point:
>> "
>>  >>>> ... the logical model is the most complete, detailed level
>>  >>>> you can get to /without/ specifying the implementation plan.
>> "
>> is not a complete definition (and does not try to be).
>> It is just demarcation of the boundary between logic
>> and implementation.

>
> You can see by the def I posted in the Logical Data Model thread that
> does not align with everything I have read. So if the LDM is about
> logic and does not presuppose a target, then what do you call the data
> model that is specified to the DBMS (the one that Pascal refers to as
> the Logical Model)?
>
>> The demarcation on the other side,
>> conceptual vs. logical, is more complicated and at
>> the same time it's IMHO less important to have a strict
>> line there, see below.

>
> That is the murky line between capturing requirements (conceptual) and
> designing solutions (logical).
>
>> [snip]
>>
>>>>>>> It is common to let "logical
>>>>>>> data model" refer to this implementation data model -- the model of the
>>>>>>> data as specified to the API used for retaining data beyond the
>>>>>>> run-time of a particular software application, for example.
>>>>>> In which circles? Can you provide a reference?
>>>>> I think it comes from the Date/Darwin/Pascal side of the house, but I
>>>>> at this point I'm just looking at Pascal's paper to verify that (so
>>>>> Date and Darwin might suggest otherwise).
>>>> Well, the way you are using (or, better /were/ using, you promised :-)
>>> Yes, but I do need a corrected definition so that I am not guessing,
>>> OK?
>> Maybe I am overlooking the obvious, but I can't come up with
>> a (to me) satisfactory definition at this time.
>>
>> By now it is clear that your question is about
>> implementation strategy, not about logical models.
>> Doesn't that stop you from having to guess?

>
> Yes, I can see that your definition pushes the logical design forward,
> in front of any idea about the target dbms. So, you could take the
> same logical data model and move to the next step of designing for an
> implementation in Cache' or XML documents or Sybase or Access.
>
>> Finding a good clean definition may be more
>> work than is called for - unless of course
>> someone else has an acceptable one for your purpose.
>>
>> [snip]
>>
>>>>>>> The requirement to retain data beyond a particular application run-time
>>>>>>> is a requirement that mixes the two, it seems.
>>>>>> I don't think so. I think it is where /sharing/ starts - as soon
>>>>>> as the next run-time incarnation may differ from an earlier one.
>>>>> OK, if that is how you define "sharing" then yes, the data are to be
>>>>> shared.
>>>> How would you define sharing?
>>> From prior discussions, I was understanding that "sharing" as in "large
>>> shared data banks" indicats that the database was to be shared by
>>> multiple points with no assumption that any entity controlled all
>>> entities who are sharing.
>> I could go with that.
>>
>>> Each entity sharing this database would need
>>> to be able to do so without assuming any coordination with others who
>>> are sharing it.  How do you define sharing?
>> A try:
>> Sharing data: Use of the same data from more than one point of view.

>
> Well, I am working with "shared data" by this definition -- many points
> of view, but they are all "managed" collectively. There need be no
> assumption that each software entity that is sharing this database does
> so in its own silo, nor that you must permit this sharing by code
> written by developers who do not talk to each other or do shared QA,
> for example. With Codd's "large shared data bank" I think there is
> some assumption that we need to be able to permit people who don't know
> each other to each write code that shares only the database and nothing
> else. That would (typically) not be acceptable for the database
> products I'm talking about. However, the data are still shared among
> multiple apps and 3rd party products.
>
> Trying to get to the bottom of this, I'm working with environments
> where data, code, and developers are all shared with no assumption that
> only the data can be assumed as shared.
>
>> [snip]
>>
>>> OK, I'll review all the feedback and come back with a revised question
>>> (once I have proper definitions for the logical data model and know
>>> precisely for which model we need to know whether persistence will be
>>> handled with UniData or DB2, for example).
>> [snip]
>>
>>>> Say we have a logical model - now we decide to implement using
>>>> hierarchical tools /without/ specifying which one (IMS, Lotus Domino,
>>>> XML, just to name a few alternatives) - now what? Which choices
>>>> do we have to make?
>>> Yes, yes, this is very close to my question.  Conceptual model is
>>> independent of any target environment.  Then there is a logical data
>>> model.  If that is independent of any target environment as well (I
>>> still need a def), then we could have a subsequent question (if you are
>>> as old as me, then you can put it in a diamond shape with the words
>>> "relational model" and a question mark) of whether we are using a
>>> product that implements the relational model or not.  If yes then we
>>> would take the logical model and prepare a relational implementation
>>> model from it, putting data in 1NF, addressing such issues as the SQL
>>> NULL.  If no, then ... (this is where my question is).
>> Dunno about the 1NF/list diamond, but NULLs to me are not only
>> markers for the absence of (a) value, they are also the sign of the
>> absence of sufficient effort put into the logical data model.

>
> In the case where someone takes nulls to be markers for the absense of
> a value, rather than as a value (which is my preference), then I agree.
>
>
> This is definitely significant in a logical data model. If I know that
> for some people we have the predicate <Name> has a marital status of
> <marital status> and for some people we will not have this proposition,
> with my null value (compared to your lack of value) I might
> legitimately model with a Person relation that includes name and
> marital status. With a SQL-DBMS target, I would not do that. So, I
> think this logical model of yours does need to have some knowledge of
> the target in order to be useful.
>
>>> Now, if "logical data model" is defined to assume the relational model,
>> I did not assume that. Some do, but I don't do that.
>> I have really seen complete logical data models which served non-SQL
>> implementations.
>>
>>> which is the way I was using the term (apparently incorrectly), then we
>>> need to move the diamond shape with the question mark in it between the
>>> conceptual and logical models, which is where I started.
>>>
>>>> Is that what your question is about?
>>> Yes!
>>>
>>>> I could imagine useful treatment of this problem
>>>> in the abstract, but I am not aware of such treatment.
>>> Nor am I, so I'm asking around.
>>>
>>>>> I don't think we have cdm, ldm, pdm or implementation data model
>>>>> in our glossary, but I'm not looking at it to verify that.  Is there
>>>> I see no need to include them. The most basic misunderstanding
>>>> in the OP (as I see it) is specifically about the distinction
>>>> between logical and implementation-specific, no need to mix
>>>> in even more types of models.
>>> OK, then if you could just define them for me, it would be most
>>> helpful.  I'll see if I should revise my understanding (as indicated in
>>> my Naked Model blog entry), based on your definition.
>> I have seen this used for the distinction between
>> conceptual and logical data model:
>> A conceptual model can be from one point of view (application or
>> process), the logical model has to cater for all points of view.
>> In this case I would choose not to use these terms, though.
>> I'd say for instance: (single) process data model
>> versus integrated model.
>>
>>>>> someone who has laid out defs that you like so I can start there in
>>>>> forming the question?
>>>>>>>> How can that possibly help?
>>>>>>> The implementation data model for data that a software component passes
>>>>>>> to an SQL-DBMS is often quite different from the implementation data
>>>>>>> model for that same conceptual data model when software other than a
>>>>>>> SQL-DBMS is used to store and retrieve said data.
>>>>>> That implementation is relevant is no reason for mixing
>>>>>> it with logical model issues. You say you accept the terminology
>>>>>> change, but you seem reluctant to do away with the old one.
>>>>> I tried to change to say "implementation data model" instead of
>>>>> "logical data model."
>>>> I don't think "implementation data model" as a term
>>>> helps (no objection to casual use, of course).
>>> I'm open to whatever works for you.  I just need those definitions.
>> If you do - I can't help you there,
>> but I don't think you do. Surely clean definitions
>> could be helpful, but there are more ways to do scope cutting
>> and disambiguation.
>>
>> [snip submarine]
>>
>>>> Just work with /real/ examples (user-validated sentences) instead of
>>>> abstract things, and you will immediately reap some benefits without
>>>> the need to deeply study ORM.
>>> Yes, and I do like working with user-validated sentences.
>>>
>>>> Now if you bump against specific
>>>> modeling difficulties using that approach search for that
>>>> problem on the ORM sites - or even ask here; I think Hugo is
>>>> still lurking here :-)
>>> Yes, I prepared a small ORM diagram to test it out and it seemed too
>>> complex for sharing with a user compared to an ERD (or simple UML for
>>> that matter).  Users like to see properties grouped with entities just
>>> as I do.  OK, OK, propositions -- I mean that users like to see many of
>>> the nouns from some of the sentence collected together.
>> Yes, please don't let the overload of graphical stuff push away
>> the central issue: use /real/ facts.
>>
>> [snip bang for the buck]
>>>> In SQL, order is supposed not to carry meaning by itself.
>>> Yes, unlike sentences.
>> Indeed.

>
> ;-)
>
>>>> If some order has a meaning, it has to be made explicit, e.g. by
>>>> using a rank attribute. A presented set can have a differently
>>>> ordered second presentation, without having a different set.
>>>> In documents, if the order changes, you have another document.
>>> Agreed.
>> [snip]
>>
>>> Only because I need to rephrase the question and am apparently using
>>> the term Logical Data Model incorrectly, yet I'm not certain whether if
>>> you and I are both given the same conceptual data model and you are
>>> implementing it in Oracle and I in UniData, whether we might have the
>>> same logical data model, although different implementation data models,
>>> or whether our logical data models would differ.  Mine would include
>>> multi-valued attributes, for example.  Thanks for any clarification.
>> Mine do, to - in theory.

>
> Good. Dang, the Tigers just lost, time to retire for the evening.
>
>> When everybody knows we'll implement in SQL,
>> MVA's do tend to get avoided.

>
> Yes, that was my impression. I have seen what were termed logical data
> models for both SQL-DBMS and PIck target environments and they are
> decidedly different related to multi-valued attributes and nulls (not
> to mention "code files" and other various different design patterns).
>
>> This is letting implementation
>> guide the logic - strictly a no-no.

>
> I definitely think that is a no-no in the conceptual data model, but
> I'm not sure how helpful a logical data model is without some
> assumption about whether the implementation will be in a product that
> looks like UniData compared to a product that looks like Oracle, for
> example.
>
> Thanks for your comments. --dawn
>
-- 
"The person who says it cannot be done
should not interrupt the person doing it."
Chinese Proverb.
Received on Fri Oct 27 2006 - 13:34:17 CEST

Original text of this message