Re: Modeling Data for XML instead of SQL-DBMS
From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Fri, 27 Oct 2006 13:34:17 +0200
Message-ID: <4541ee0d$0$321$e4fe514c_at_news.xs4all.nl>
>
> The conceptual data model is the only model that is strictly about the
> data, having nothing to do with how to "handle" the data, right? From
> there we go to "design." I think of the conceptual model as analysis
> and the logical model as design. I don't really think about any other
> models, with iterations of the logical model until it becomes the
> version of the logical model that gets implemented. I used to think of
> the logical model as the one at the start of that process, but after
> reading a bit, I started to think of it as the one at the end of the
> process (the implementation model). It is the last one that includes
> design related to optimizing for the target environment.
>
> The question I have now is whether the logical data model presupposes a
> family of target DBMS's, such as SQL-DBMS's or Pick DBMS's or whether
> the logical data model is data-model independent. Pascal's definition
> makes it very data-model dependent. That is how I was using the term
> in my original question.
>
>
> That is also how I think of it for other environments, which would make
> sense if the logical data model is related to the target data model.
>
>
> I have no interest in the physical model the way I think of that term
> (other than to have knowledge of it for performace tuning for the final
> implementation model).
>
>
> Yes, agreed. I let the DBMS developers care about the storage model.
> The logical model is the one I specify to the DBMS. I'll grant that
> with some tools the physical model more closely resembles the logical
> model than in others.
>
>
> Agreed.
>
>
> You can see by the def I posted in the Logical Data Model thread that
> does not align with everything I have read. So if the LDM is about
> logic and does not presuppose a target, then what do you call the data
> model that is specified to the DBMS (the one that Pascal refers to as
> the Logical Model)?
>
>
> That is the murky line between capturing requirements (conceptual) and
> designing solutions (logical).
>
>
> Yes, I can see that your definition pushes the logical design forward,
> in front of any idea about the target dbms. So, you could take the
> same logical data model and move to the next step of designing for an
> implementation in Cache' or XML documents or Sybase or Access.
>
>
> Well, I am working with "shared data" by this definition -- many points
> of view, but they are all "managed" collectively. There need be no
> assumption that each software entity that is sharing this database does
> so in its own silo, nor that you must permit this sharing by code
> written by developers who do not talk to each other or do shared QA,
> for example. With Codd's "large shared data bank" I think there is
> some assumption that we need to be able to permit people who don't know
> each other to each write code that shares only the database and nothing
> else. That would (typically) not be acceptable for the database
> products I'm talking about. However, the data are still shared among
> multiple apps and 3rd party products.
>
> Trying to get to the bottom of this, I'm working with environments
> where data, code, and developers are all shared with no assumption that
> only the data can be assumed as shared.
>
>
> In the case where someone takes nulls to be markers for the absense of
> a value, rather than as a value (which is my preference), then I agree.
>
>
> This is definitely significant in a logical data model. If I know that
> for some people we have the predicate <Name> has a marital status of
> <marital status> and for some people we will not have this proposition,
> with my null value (compared to your lack of value) I might
> legitimately model with a Person relation that includes name and
> marital status. With a SQL-DBMS target, I would not do that. So, I
> think this logical model of yours does need to have some knowledge of
> the target in order to be useful.
>
>
> ;-)
>
>
> Good. Dang, the Tigers just lost, time to retire for the evening.
>
>
> Yes, that was my impression. I have seen what were termed logical data
> models for both SQL-DBMS and PIck target environments and they are
> decidedly different related to multi-valued attributes and nulls (not
> to mention "code files" and other various different design patterns).
>
>
> I definitely think that is a no-no in the conceptual data model, but
> I'm not sure how helpful a logical data model is without some
> assumption about whether the implementation will be in a product that
> looks like UniData compared to a product that looks like Oracle, for
> example.
>
> Thanks for your comments. --dawn
>
Date: Fri, 27 Oct 2006 13:34:17 +0200
Message-ID: <4541ee0d$0$321$e4fe514c_at_news.xs4all.nl>
dawn wrote:
> mAsterdam wrote:
>> dawn wrote: >>> mAsterdam wrote: >>>> dawn wrote: >>>>> mAsterdam wrote: >>>>>> ... the logical model is the most complete, detailed level >>>>>> you can get to /without/ specifying the implementation plan. >>>>>> I don't think I should unlearn that. >> [snip (un)muddle] >> >>> In order to clarify, my question would be whether the logical model is >>> data model independent. The conceptual data model is data model >>> independent. The logical data model could be defined as >>> data-model-dependent or could still be independent of data model >>> employed by the target DBMS. My understanding was that it was >>> data-model dependent and pretty much resembled the implementation data >>> model (which might be adjusted specific to the toolset used, however). >>> >>> So, I guess I'm interested in the very first data model that is no >>> longer agnostic about the target environment. Is that or is it not the >>> logical data model? >> I may have learned definitions a long time ago, but I never felt the >> need to fall back on them. I just don't use these terms in a clean, >> universally correct, ivory tower way. >> I do try to use appropriate wordings, taking the >> purpose and context into account. >> >> I could use them in a design session like so: 'Hey guys, that's >> an implementation issue, we 'll deal with that later. For now we have >> to limit our discussion to the logic of the data itself.', relying >> on the audiences connotations with 'logic' and 'implementation'. >> >> Even your 'very first data model that is no >> longer agnostic about the target environment' >> makes me wonder - can there be such a model; >> data model and non-agnostic about the target environment >> at the same time? It is not about the data, >> it is about how to handle the data.
>
> The conceptual data model is the only model that is strictly about the
> data, having nothing to do with how to "handle" the data, right? From
> there we go to "design." I think of the conceptual model as analysis
> and the logical model as design. I don't really think about any other
> models, with iterations of the logical model until it becomes the
> version of the logical model that gets implemented. I used to think of
> the logical model as the one at the start of that process, but after
> reading a bit, I started to think of it as the one at the end of the
> process (the implementation model). It is the last one that includes
> design related to optimizing for the target environment.
>
> The question I have now is whether the logical data model presupposes a
> family of target DBMS's, such as SQL-DBMS's or Pick DBMS's or whether
> the logical data model is data-model independent. Pascal's definition
> makes it very data-model dependent. That is how I was using the term
> in my original question.
>
>> When the implementation is in SQL the schema can be >> very close to the logical data model so the distinction isn't >> important most of the time.
>
> That is also how I think of it for other environments, which would make
> sense if the logical data model is related to the target data model.
>
>> In other environments >> you can have a physical model, elements of which will have >> to be thoroughly associated with elements from the logical >> data model - but I would not call this physical model a >> data model -
>
> I have no interest in the physical model the way I think of that term
> (other than to have knowledge of it for performace tuning for the final
> implementation model).
>
>> I'd call it a storage model if I'd have to classify it.
>
> Yes, agreed. I let the DBMS developers care about the storage model.
> The logical model is the one I specify to the DBMS. I'll grant that
> with some tools the physical model more closely resembles the logical
> model than in others.
>
>> If somebody else would call it a physical data-model, >> I would not interrupt, the message is clear. Calling it a >> logical model /would/ make me object; it's wrong: >> the physical model is not /about/ the logic, it presupposes >> the existence of a logical data model.
>
> Agreed.
>
>> Another point: >> " >> >>>> ... the logical model is the most complete, detailed level >> >>>> you can get to /without/ specifying the implementation plan. >> " >> is not a complete definition (and does not try to be). >> It is just demarcation of the boundary between logic >> and implementation.
>
> You can see by the def I posted in the Logical Data Model thread that
> does not align with everything I have read. So if the LDM is about
> logic and does not presuppose a target, then what do you call the data
> model that is specified to the DBMS (the one that Pascal refers to as
> the Logical Model)?
>
>> The demarcation on the other side, >> conceptual vs. logical, is more complicated and at >> the same time it's IMHO less important to have a strict >> line there, see below.
>
> That is the murky line between capturing requirements (conceptual) and
> designing solutions (logical).
>
>> [snip] >> >>>>>>> It is common to let "logical >>>>>>> data model" refer to this implementation data model -- the model of the >>>>>>> data as specified to the API used for retaining data beyond the >>>>>>> run-time of a particular software application, for example. >>>>>> In which circles? Can you provide a reference? >>>>> I think it comes from the Date/Darwin/Pascal side of the house, but I >>>>> at this point I'm just looking at Pascal's paper to verify that (so >>>>> Date and Darwin might suggest otherwise). >>>> Well, the way you are using (or, better /were/ using, you promised :-) >>> Yes, but I do need a corrected definition so that I am not guessing, >>> OK? >> Maybe I am overlooking the obvious, but I can't come up with >> a (to me) satisfactory definition at this time. >> >> By now it is clear that your question is about >> implementation strategy, not about logical models. >> Doesn't that stop you from having to guess?
>
> Yes, I can see that your definition pushes the logical design forward,
> in front of any idea about the target dbms. So, you could take the
> same logical data model and move to the next step of designing for an
> implementation in Cache' or XML documents or Sybase or Access.
>
>> Finding a good clean definition may be more >> work than is called for - unless of course >> someone else has an acceptable one for your purpose. >> >> [snip] >> >>>>>>> The requirement to retain data beyond a particular application run-time >>>>>>> is a requirement that mixes the two, it seems. >>>>>> I don't think so. I think it is where /sharing/ starts - as soon >>>>>> as the next run-time incarnation may differ from an earlier one. >>>>> OK, if that is how you define "sharing" then yes, the data are to be >>>>> shared. >>>> How would you define sharing? >>> From prior discussions, I was understanding that "sharing" as in "large >>> shared data banks" indicats that the database was to be shared by >>> multiple points with no assumption that any entity controlled all >>> entities who are sharing. >> I could go with that. >> >>> Each entity sharing this database would need >>> to be able to do so without assuming any coordination with others who >>> are sharing it. How do you define sharing? >> A try: >> Sharing data: Use of the same data from more than one point of view.
>
> Well, I am working with "shared data" by this definition -- many points
> of view, but they are all "managed" collectively. There need be no
> assumption that each software entity that is sharing this database does
> so in its own silo, nor that you must permit this sharing by code
> written by developers who do not talk to each other or do shared QA,
> for example. With Codd's "large shared data bank" I think there is
> some assumption that we need to be able to permit people who don't know
> each other to each write code that shares only the database and nothing
> else. That would (typically) not be acceptable for the database
> products I'm talking about. However, the data are still shared among
> multiple apps and 3rd party products.
>
> Trying to get to the bottom of this, I'm working with environments
> where data, code, and developers are all shared with no assumption that
> only the data can be assumed as shared.
>
>> [snip] >> >>> OK, I'll review all the feedback and come back with a revised question >>> (once I have proper definitions for the logical data model and know >>> precisely for which model we need to know whether persistence will be >>> handled with UniData or DB2, for example). >> [snip] >> >>>> Say we have a logical model - now we decide to implement using >>>> hierarchical tools /without/ specifying which one (IMS, Lotus Domino, >>>> XML, just to name a few alternatives) - now what? Which choices >>>> do we have to make? >>> Yes, yes, this is very close to my question. Conceptual model is >>> independent of any target environment. Then there is a logical data >>> model. If that is independent of any target environment as well (I >>> still need a def), then we could have a subsequent question (if you are >>> as old as me, then you can put it in a diamond shape with the words >>> "relational model" and a question mark) of whether we are using a >>> product that implements the relational model or not. If yes then we >>> would take the logical model and prepare a relational implementation >>> model from it, putting data in 1NF, addressing such issues as the SQL >>> NULL. If no, then ... (this is where my question is). >> Dunno about the 1NF/list diamond, but NULLs to me are not only >> markers for the absence of (a) value, they are also the sign of the >> absence of sufficient effort put into the logical data model.
>
> In the case where someone takes nulls to be markers for the absense of
> a value, rather than as a value (which is my preference), then I agree.
>
>
> This is definitely significant in a logical data model. If I know that
> for some people we have the predicate <Name> has a marital status of
> <marital status> and for some people we will not have this proposition,
> with my null value (compared to your lack of value) I might
> legitimately model with a Person relation that includes name and
> marital status. With a SQL-DBMS target, I would not do that. So, I
> think this logical model of yours does need to have some knowledge of
> the target in order to be useful.
>
>>> Now, if "logical data model" is defined to assume the relational model, >> I did not assume that. Some do, but I don't do that. >> I have really seen complete logical data models which served non-SQL >> implementations. >> >>> which is the way I was using the term (apparently incorrectly), then we >>> need to move the diamond shape with the question mark in it between the >>> conceptual and logical models, which is where I started. >>> >>>> Is that what your question is about? >>> Yes! >>> >>>> I could imagine useful treatment of this problem >>>> in the abstract, but I am not aware of such treatment. >>> Nor am I, so I'm asking around. >>> >>>>> I don't think we have cdm, ldm, pdm or implementation data model >>>>> in our glossary, but I'm not looking at it to verify that. Is there >>>> I see no need to include them. The most basic misunderstanding >>>> in the OP (as I see it) is specifically about the distinction >>>> between logical and implementation-specific, no need to mix >>>> in even more types of models. >>> OK, then if you could just define them for me, it would be most >>> helpful. I'll see if I should revise my understanding (as indicated in >>> my Naked Model blog entry), based on your definition. >> I have seen this used for the distinction between >> conceptual and logical data model: >> A conceptual model can be from one point of view (application or >> process), the logical model has to cater for all points of view. >> In this case I would choose not to use these terms, though. >> I'd say for instance: (single) process data model >> versus integrated model. >> >>>>> someone who has laid out defs that you like so I can start there in >>>>> forming the question? >>>>>>>> How can that possibly help? >>>>>>> The implementation data model for data that a software component passes >>>>>>> to an SQL-DBMS is often quite different from the implementation data >>>>>>> model for that same conceptual data model when software other than a >>>>>>> SQL-DBMS is used to store and retrieve said data. >>>>>> That implementation is relevant is no reason for mixing >>>>>> it with logical model issues. You say you accept the terminology >>>>>> change, but you seem reluctant to do away with the old one. >>>>> I tried to change to say "implementation data model" instead of >>>>> "logical data model." >>>> I don't think "implementation data model" as a term >>>> helps (no objection to casual use, of course). >>> I'm open to whatever works for you. I just need those definitions. >> If you do - I can't help you there, >> but I don't think you do. Surely clean definitions >> could be helpful, but there are more ways to do scope cutting >> and disambiguation. >> >> [snip submarine] >> >>>> Just work with /real/ examples (user-validated sentences) instead of >>>> abstract things, and you will immediately reap some benefits without >>>> the need to deeply study ORM. >>> Yes, and I do like working with user-validated sentences. >>> >>>> Now if you bump against specific >>>> modeling difficulties using that approach search for that >>>> problem on the ORM sites - or even ask here; I think Hugo is >>>> still lurking here :-) >>> Yes, I prepared a small ORM diagram to test it out and it seemed too >>> complex for sharing with a user compared to an ERD (or simple UML for >>> that matter). Users like to see properties grouped with entities just >>> as I do. OK, OK, propositions -- I mean that users like to see many of >>> the nouns from some of the sentence collected together. >> Yes, please don't let the overload of graphical stuff push away >> the central issue: use /real/ facts. >> >> [snip bang for the buck] >>>> In SQL, order is supposed not to carry meaning by itself. >>> Yes, unlike sentences. >> Indeed.
>
> ;-)
>
>>>> If some order has a meaning, it has to be made explicit, e.g. by >>>> using a rank attribute. A presented set can have a differently >>>> ordered second presentation, without having a different set. >>>> In documents, if the order changes, you have another document. >>> Agreed. >> [snip] >> >>> Only because I need to rephrase the question and am apparently using >>> the term Logical Data Model incorrectly, yet I'm not certain whether if >>> you and I are both given the same conceptual data model and you are >>> implementing it in Oracle and I in UniData, whether we might have the >>> same logical data model, although different implementation data models, >>> or whether our logical data models would differ. Mine would include >>> multi-valued attributes, for example. Thanks for any clarification. >> Mine do, to - in theory.
>
> Good. Dang, the Tigers just lost, time to retire for the evening.
>
>> When everybody knows we'll implement in SQL, >> MVA's do tend to get avoided.
>
> Yes, that was my impression. I have seen what were termed logical data
> models for both SQL-DBMS and PIck target environments and they are
> decidedly different related to multi-valued attributes and nulls (not
> to mention "code files" and other various different design patterns).
>
>> This is letting implementation >> guide the logic - strictly a no-no.
>
> I definitely think that is a no-no in the conceptual data model, but
> I'm not sure how helpful a logical data model is without some
> assumption about whether the implementation will be in a product that
> looks like UniData compared to a product that looks like Oracle, for
> example.
>
> Thanks for your comments. --dawn
>
-- "The person who says it cannot be done should not interrupt the person doing it." Chinese Proverb.Received on Fri Oct 27 2006 - 13:34:17 CEST
