Re: Modeling Data for XML instead of SQL-DBMS

From: dawn <dawnwolthuis_at_gmail.com>
Date: 26 Oct 2006 21:17:36 -0700
Message-ID: <1161922656.586493.210100_at_h48g2000cwc.googlegroups.com>


mAsterdam wrote:
> dawn wrote:
> > mAsterdam wrote:
> >> dawn wrote:
> >>> mAsterdam wrote:
> >>>> ... the logical model is the most complete, detailed level
> >>>> you can get to /without/ specifying the implementation plan.
> >>>> I don't think I should unlearn that.
>
> [snip (un)muddle]
>
> > In order to clarify, my question would be whether the logical model is
> > data model independent. The conceptual data model is data model
> > independent. The logical data model could be defined as
> > data-model-dependent or could still be independent of data model
> > employed by the target DBMS. My understanding was that it was
> > data-model dependent and pretty much resembled the implementation data
> > model (which might be adjusted specific to the toolset used, however).
> >
> > So, I guess I'm interested in the very first data model that is no
> > longer agnostic about the target environment. Is that or is it not the
> > logical data model?
>
> I may have learned definitions a long time ago, but I never felt the
> need to fall back on them. I just don't use these terms in a clean,
> universally correct, ivory tower way.
> I do try to use appropriate wordings, taking the
> purpose and context into account.
>
> I could use them in a design session like so: 'Hey guys, that's
> an implementation issue, we 'll deal with that later. For now we have
> to limit our discussion to the logic of the data itself.', relying
> on the audiences connotations with 'logic' and 'implementation'.
>
> Even your 'very first data model that is no
> longer agnostic about the target environment'
> makes me wonder - can there be such a model;
> data model and non-agnostic about the target environment
> at the same time? It is not about the data,
> it is about how to handle the data.

The conceptual data model is the only model that is strictly about the data, having nothing to do with how to "handle" the data, right? From there we go to "design." I think of the conceptual model as analysis and the logical model as design. I don't really think about any other models, with iterations of the logical model until it becomes the version of the logical model that gets implemented. I used to think of the logical model as the one at the start of that process, but after reading a bit, I started to think of it as the one at the end of the process (the implementation model). It is the last one that includes design related to optimizing for the target environment.

The question I have now is whether the logical data model presupposes a family of target DBMS's, such as SQL-DBMS's or Pick DBMS's or whether the logical data model is data-model independent. Pascal's definition makes it very data-model dependent. That is how I was using the term in my original question.

> When the implementation is in SQL the schema can be
> very close to the logical data model so the distinction isn't
> important most of the time.

That is also how I think of it for other environments, which would make sense if the logical data model is related to the target data model.

> In other environments
> you can have a physical model, elements of which will have
> to be thoroughly associated with elements from the logical
> data model - but I would not call this physical model a
> data model -

I have no interest in the physical model the way I think of that term (other than to have knowledge of it for performace tuning for the final implementation model).

> I'd call it a storage model if I'd have to classify it.

Yes, agreed. I let the DBMS developers care about the storage model. The logical model is the one I specify to the DBMS. I'll grant that with some tools the physical model more closely resembles the logical model than in others.

> If somebody else would call it a physical data-model,
> I would not interrupt, the message is clear. Calling it a
> logical model /would/ make me object; it's wrong:
> the physical model is not /about/ the logic, it presupposes
> the existence of a logical data model.

Agreed.

> Another point:
> "
> >>>> ... the logical model is the most complete, detailed level
> >>>> you can get to /without/ specifying the implementation plan.
> "
> is not a complete definition (and does not try to be).
> It is just demarcation of the boundary between logic
> and implementation.

You can see by the def I posted in the Logical Data Model thread that does not align with everything I have read. So if the LDM is about logic and does not presuppose a target, then what do you call the data model that is specified to the DBMS (the one that Pascal refers to as the Logical Model)?

> The demarcation on the other side,
> conceptual vs. logical, is more complicated and at
> the same time it's IMHO less important to have a strict
> line there, see below.

That is the murky line between capturing requirements (conceptual) and designing solutions (logical).

> [snip]
>
> >>>>> It is common to let "logical
> >>>>> data model" refer to this implementation data model -- the model of the
> >>>>> data as specified to the API used for retaining data beyond the
> >>>>> run-time of a particular software application, for example.
>
> >>>> In which circles? Can you provide a reference?
>
> >>> I think it comes from the Date/Darwin/Pascal side of the house, but I
> >>> at this point I'm just looking at Pascal's paper to verify that (so
> >>> Date and Darwin might suggest otherwise).
> >> Well, the way you are using (or, better /were/ using, you promised :-)
> >
> > Yes, but I do need a corrected definition so that I am not guessing,
> > OK?
>
> Maybe I am overlooking the obvious, but I can't come up with
> a (to me) satisfactory definition at this time.
>
> By now it is clear that your question is about
> implementation strategy, not about logical models.
> Doesn't that stop you from having to guess?

Yes, I can see that your definition pushes the logical design forward, in front of any idea about the target dbms. So, you could take the same logical data model and move to the next step of designing for an implementation in Cache' or XML documents or Sybase or Access.

> Finding a good clean definition may be more
> work than is called for - unless of course
> someone else has an acceptable one for your purpose.
>
> [snip]
>
> >>>>> The requirement to retain data beyond a particular application run-time
> >>>>> is a requirement that mixes the two, it seems.
> >>>> I don't think so. I think it is where /sharing/ starts - as soon
> >>>> as the next run-time incarnation may differ from an earlier one.
>
> >>> OK, if that is how you define "sharing" then yes, the data are to be
> >>> shared.
>
> >> How would you define sharing?
>
> > From prior discussions, I was understanding that "sharing" as in "large
> > shared data banks" indicats that the database was to be shared by
> > multiple points with no assumption that any entity controlled all
> > entities who are sharing.
>
> I could go with that.
>
> > Each entity sharing this database would need
> > to be able to do so without assuming any coordination with others who
> > are sharing it. How do you define sharing?
>
> A try:
> Sharing data: Use of the same data from more than one point of view.

Well, I am working with "shared data" by this definition -- many points of view, but they are all "managed" collectively. There need be no assumption that each software entity that is sharing this database does so in its own silo, nor that you must permit this sharing by code written by developers who do not talk to each other or do shared QA, for example. With Codd's "large shared data bank" I think there is some assumption that we need to be able to permit people who don't know each other to each write code that shares only the database and nothing else. That would (typically) not be acceptable for the database products I'm talking about. However, the data are still shared among multiple apps and 3rd party products.

Trying to get to the bottom of this, I'm working with environments where data, code, and developers are all shared with no assumption that only the data can be assumed as shared.

> [snip]
>
> > OK, I'll review all the feedback and come back with a revised question
> > (once I have proper definitions for the logical data model and know
> > precisely for which model we need to know whether persistence will be
> > handled with UniData or DB2, for example).
>
> [snip]
>
> >> Say we have a logical model - now we decide to implement using
> >> hierarchical tools /without/ specifying which one (IMS, Lotus Domino,
> >> XML, just to name a few alternatives) - now what? Which choices
> >> do we have to make?
> >
> > Yes, yes, this is very close to my question. Conceptual model is
> > independent of any target environment. Then there is a logical data
> > model. If that is independent of any target environment as well (I
> > still need a def), then we could have a subsequent question (if you are
> > as old as me, then you can put it in a diamond shape with the words
> > "relational model" and a question mark) of whether we are using a
> > product that implements the relational model or not. If yes then we
> > would take the logical model and prepare a relational implementation
> > model from it, putting data in 1NF, addressing such issues as the SQL
> > NULL. If no, then ... (this is where my question is).
>
> Dunno about the 1NF/list diamond, but NULLs to me are not only
> markers for the absence of (a) value, they are also the sign of the
> absence of sufficient effort put into the logical data model.

In the case where someone takes nulls to be markers for the absense of a value, rather than as a value (which is my preference), then I agree.

This is definitely significant in a logical data model. If I know that for some people we have the predicate <Name> has a marital status of <marital status> and for some people we will not have this proposition, with my null value (compared to your lack of value) I might legitimately model with a Person relation that includes name and marital status. With a SQL-DBMS target, I would not do that. So, I think this logical model of yours does need to have some knowledge of the target in order to be useful.

> > Now, if "logical data model" is defined to assume the relational model,
>
> I did not assume that. Some do, but I don't do that.
> I have really seen complete logical data models which served non-SQL
> implementations.
>
> > which is the way I was using the term (apparently incorrectly), then we
> > need to move the diamond shape with the question mark in it between the
> > conceptual and logical models, which is where I started.
> >
> >> Is that what your question is about?
> >
> > Yes!
> >
> >> I could imagine useful treatment of this problem
> >> in the abstract, but I am not aware of such treatment.
> >
> > Nor am I, so I'm asking around.
> >
> >>> I don't think we have cdm, ldm, pdm or implementation data model
> >>> in our glossary, but I'm not looking at it to verify that. Is there
>
> >> I see no need to include them. The most basic misunderstanding
> >> in the OP (as I see it) is specifically about the distinction
> >> between logical and implementation-specific, no need to mix
> >> in even more types of models.
> >
> > OK, then if you could just define them for me, it would be most
> > helpful. I'll see if I should revise my understanding (as indicated in
> > my Naked Model blog entry), based on your definition.
>
> I have seen this used for the distinction between
> conceptual and logical data model:
> A conceptual model can be from one point of view (application or
> process), the logical model has to cater for all points of view.
> In this case I would choose not to use these terms, though.
> I'd say for instance: (single) process data model
> versus integrated model.
>
> >>> someone who has laid out defs that you like so I can start there in
> >>> forming the question?
>
> >>>>>> How can that possibly help?
> >>>>> The implementation data model for data that a software component passes
> >>>>> to an SQL-DBMS is often quite different from the implementation data
> >>>>> model for that same conceptual data model when software other than a
> >>>>> SQL-DBMS is used to store and retrieve said data.
>
> >>>> That implementation is relevant is no reason for mixing
> >>>> it with logical model issues. You say you accept the terminology
> >>>> change, but you seem reluctant to do away with the old one.
>
> >>> I tried to change to say "implementation data model" instead of
> >>> "logical data model."
> >> I don't think "implementation data model" as a term
> >> helps (no objection to casual use, of course).
> >
> > I'm open to whatever works for you. I just need those definitions.
>
> If you do - I can't help you there,
> but I don't think you do. Surely clean definitions
> could be helpful, but there are more ways to do scope cutting
> and disambiguation.
>
> [snip submarine]
>
> >> Just work with /real/ examples (user-validated sentences) instead of
> >> abstract things, and you will immediately reap some benefits without
> >> the need to deeply study ORM.
> >
> > Yes, and I do like working with user-validated sentences.
> >
> >> Now if you bump against specific
> >> modeling difficulties using that approach search for that
> >> problem on the ORM sites - or even ask here; I think Hugo is
> >> still lurking here :-)
> >
> > Yes, I prepared a small ORM diagram to test it out and it seemed too
> > complex for sharing with a user compared to an ERD (or simple UML for
> > that matter). Users like to see properties grouped with entities just
> > as I do. OK, OK, propositions -- I mean that users like to see many of
> > the nouns from some of the sentence collected together.
>
> Yes, please don't let the overload of graphical stuff push away
> the central issue: use /real/ facts.
>
> [snip bang for the buck]
> >> In SQL, order is supposed not to carry meaning by itself.
> >
> > Yes, unlike sentences.
>
> Indeed.

;-)

> >> If some order has a meaning, it has to be made explicit, e.g. by
> >> using a rank attribute. A presented set can have a differently
> >> ordered second presentation, without having a different set.
> >> In documents, if the order changes, you have another document.
> >
> > Agreed.
>
> [snip]
>
> > Only because I need to rephrase the question and am apparently using
> > the term Logical Data Model incorrectly, yet I'm not certain whether if
> > you and I are both given the same conceptual data model and you are
> > implementing it in Oracle and I in UniData, whether we might have the
> > same logical data model, although different implementation data models,
> > or whether our logical data models would differ. Mine would include
> > multi-valued attributes, for example. Thanks for any clarification.
>
> Mine do, to - in theory.

Good. Dang, the Tigers just lost, time to retire for the evening.

> When everybody knows we'll implement in SQL,
> MVA's do tend to get avoided.

Yes, that was my impression. I have seen what were termed logical data models for both SQL-DBMS and PIck target environments and they are decidedly different related to multi-valued attributes and nulls (not to mention "code files" and other various different design patterns).

> This is letting implementation
> guide the logic - strictly a no-no.

I definitely think that is a no-no in the conceptual data model, but I'm not sure how helpful a logical data model is without some assumption about whether the implementation will be in a product that looks like UniData compared to a product that looks like Oracle, for example.

Thanks for your comments. --dawn Received on Fri Oct 27 2006 - 06:17:36 CEST

Original text of this message