Re: Modeling Data for XML instead of SQL-DBMS

From: dawn <dawnwolthuis_at_gmail.com>
Date: 26 Oct 2006 13:18:04 -0700
Message-ID: <1161893884.428842.156110_at_m7g2000cwm.googlegroups.com>


mAsterdam wrote:
> dawn wrote:
> > mAsterdam wrote:
> >> ... the logical model is the most complete, detailed level
> >> you can get to /without/ specifying the implementation plan.
> >> I don't think I should unlearn that.
> >
> > It took me a bit to find this since it was in a paper, rather than a
> > book. I previouly defined the logical data model to be independent of
> > any particular implementation, but decided to change my own def to
> > align with what I read from Pascal.
> >
> > Pascal, Unmuddling Modeling, Feb 2005 (there might be an updated
> > version, but this is the latest one that I have). He says things like
> > this form p.12.
> >
> > "Database representations of the formal versions of business
> > rules/conceptual models as mapped via the data model are called logical
> > models."
>
> As clear as muddle, this. No need to unlearn.

In order to clarify, my question would be whether the logical model is data model independent. The conceptual data model is data model independent. The logical data model could be defined as data-model-dependent or could still be independent of data model employed by the target DBMS. My understanding was that it was data-model dependent and pretty much resembled the implementation data model (which might be adjusted specific to the toolset used, however).

So, I guess I'm interested in the very first data model that is no longer agnostic about the target environment. Is that or is it not the logical data model?

> >> At that time the idea was to completely design the logic
> >> before even thinking about implementation - nobody really did it that
> >> way, but it was the accepted 'how it /should/ be done'. Later
> >> method(ologie)s addressed more technical issues up front, in the form
> >> of architecture templates. That is the good and natural
> >> evolution - theory closer to practice - but it does not give an
> >> excuse to blur the distinction logical versus implementation.
> >>
> >>> If you want to have "conceptual data model" then
> >>> "logical data model" and then another
> >>> "implementation data model" so be it.
>
> >> The further away from cut-over-day, the more freedom there is
> >> in deciding what deliverables are really necessary - it also
> >> depends on culture, the size and nature of the project, the user base,
> >> the project team size, management style.
> >> Any one-set-of-deliverables-fits-all-projects theory is
> >> doomed to stay just that: theory.
> >>
> >>> It is common to let "logical
> >>> data model" refer to this implementation data model -- the model of the
> >>> data as specified to the API used for retaining data beyond the
> >>> run-time of a particular software application, for example.
>
> >> In which circles? Can you provide a reference?
> >
> > I think it comes from the Date/Darwin/Pascal side of the house, but I
> > at this point I'm just looking at Pascal's paper to verify that (so
> > Date and Darwin might suggest otherwise).
>
> Well, the way you are using (or, better /were/ using, you promised :-)

Yes, but I do need a corrected definition so that I am not guessing, OK?

> these terms is counterintuitive to me and inconsistent -
> I did not see this when reading Date & Darwen.
> About Pascal, dunno. His b.s. bar theory about v.i.
> makes me less inclined to reading his other stuff.
>
> >>>> Surely there is no need for persistence when we have no data to
> >>>> persist,
> >>> Correct. I am referring to data that is persisted (stored, if
> >>> preferred).
> >>>
> >>>> and from the other perspective: our data is valuable so
> >>>> we need persistence for it - but that's about it; mixing the two
> >>>> topics frustrates both discussions.
> >>> The requirement to retain data beyond a particular application run-time
> >>> is a requirement that mixes the two, it seems.
> >> I don't think so. I think it is where /sharing/ starts - as soon
> >> as the next run-time incarnation may differ from an earlier one.
> >
> > OK, if that is how you define "sharing" then yes, the data are to be
> > shared.
>
> How would you define sharing?

>From prior discussions, I was understanding that "sharing" as in "large
shared data banks" indicats that the database was to be shared by multiple points with no assumption that any entity controlled all entities who are sharing. Each entity sharing this database would need to be able to do so without assuming any coordination with others who are sharing it. How do you define sharing?

> >>> But if it helps, I will
> >>> state that I do not need to care about any physical storage device,
> >>> only the interface/language/api used to pass data that is to be stored
> >>> and retrive it.
>
> >> Ok, that does narrow it down and makes more sense to me.
> >> Though I have seen quite a lot of
> >> those interface/language/api -s I have yet to see a foundational
> >> approach to them outside the relational school, the Codd/Date folks.
> >
> > I have read some of the papers that Jan pointed me to related to
> > functional databases as well as di-graphs. I don't recall taking away
> > anything from those in the form of "best practices" in data modeling
> > derived from theory, although there might be some. As best I know,
> > nothing has made its way to those who are actually designing data
> > models for such environments. They typically rely on learning from
> > their own mistakes and those of people around them.
> >
> > I think we could gain a bit by learning more broadly in this area as a
> > profession. There are some roadblocks, however, including the fact
> > that those doing data modeling for XML or any non-SQL-DBMS
> > "persistence" typically know they are not doing what is taught in
> > colleges, so they do it in back rooms with the lights dim.
>
> So what you are really asking is "The real book on multi-app
> hierarchical modeling and extreme gay sex", by several anonymous
> authors ;-) or ...

Are you reading my postings to other ngs or what ;-)

> > Additionally, most of those who do such data design are also doing OOP
> > and other application programming design -- so they hold discussions
> > with other app developers, not in "data modeling" or "database"
> > circles.
>
> ... "The best way to make files without database knowledge."
> Seriously, I don't think this is the road to improving the question.

OK.

> >> ... could you attempt a rephrase?
> >
> > I'm missing what you want rephrased. My question is about the
>
> The original question - or are the above parodies on the mark?

OK, I'll review all the feedback and come back with a revised question (once I have proper definitions for the logical data model and know precisely for which model we need to know whether persistence will be handled with UniData or DB2, for example).

> > implementation data model that I was calling the logical data model,
> > only because I wanted to align with relational theorists terminology.
>
> You can use it, or not use it, or learn to use it.
> 'Aligning' suggests you have an alternate theory around
> concepts which more or less correspond to relational theory.
> You don't, do you?

I have many more questions than answers.

> > The tools/languages/apis/interfaces used to put and get data from
> > application software are relevant to this implementation data model.
> >
> >>>>>>> documents and not in an SQL-DBMS, the tools would not require that the
> >>>>>>> data model be in 1NF or the use of the SQL NULL.
> >>>>>> /data model/
> >>>>>> ?? document model!
> >>>>> No, it is the data model that is of interest to me,
> >>>> Why, then, bring the storage (files, persistence) and other
> >>>> containers (XML, SQL-DBMS) into the scope?
> >>> Because it is relevant.
> >> To the implementation. But relevant, definitely.
> >> (Just to remind you that I do /not/ accept the terminology of
> >> the original question :-)
> >
> > I tried to at least roughly align my terminology in my The Naked Model
> > blog entry with Pascal's from his Unmuddling modeling paper. I can
> > adapt to another approach. I have heard some call the implementation
> > model the physical model, but that is lower in the defs I'm working
> > with.
>
> Say we have a logical model - now we decide to implement using
> hierarchical tools /without/ specifying which one (IMS, Lotus Domino,
> XML, just to name a few alternatives) - now what? Which choices
> do we have to make?

Yes, yes, this is very close to my question. Conceptual model is independent of any target environment. Then there is a logical data model. If that is independent of any target environment as well (I still need a def), then we could have a subsequent question (if you are as old as me, then you can put it in a diamond shape with the words "relational model" and a question mark) of whether we are using a product that implements the relational model or not. If yes then we would take the logical model and prepare a relational implementation model from it, putting data in 1NF, addressing such issues as the SQL NULL. If no, then ... (this is where my question is).

Now, if "logical data model" is defined to assume the relational model, which is the way I was using the term (apparently incorrectly), then we need to move the diamond shape with the question mark in it between the conceptual and logical models, which is where I started.

> Is that what your question is about?

Yes!

> I could imagine useful treatment of this problem
> in the abstract, but I am not aware of such treatment.

Nor am I, so I'm asking around.

> > I don't think we have cdm, ldm, pdm or implementation data model
> > in our glossary, but I'm not looking at it to verify that. Is there
>
> I see no need to include them. The most basic misunderstanding
> in the OP (as I see it) is specifically about the distinction
> between logical and implementation-specific, no need to mix
> in even more types of models.

OK, then if you could just define them for me, it would be most helpful. I'll see if I should revise my understanding (as indicated in my Naked Model blog entry), based on your definition.

> > someone who has laid out defs that you like so I can start there in
> > forming the question?
> >
> >>>> How can that possibly help?
> >>> The implementation data model for data that a software component passes
> >>> to an SQL-DBMS is often quite different from the implementation data
> >>> model for that same conceptual data model when software other than a
> >>> SQL-DBMS is used to store and retrieve said data.
> >> That implementation is relevant is no reason for mixing
> >> it with logical model issues. You say you accept the terminology
> >> change, but you seem reluctant to do away with the old one.
> >
> > I tried to change to say "implementation data model" instead of
> > "logical data model."
>
> I don't think "implementation data model" as a term
> helps (no objection to casual use, of course).

I'm open to whatever works for you. I just need those definitions.

> When detailing an implementation, one of the given things
> is the (logical) data model. Another one is the set of tools
> you will use while building and during, test, deployment
> and exploitation.

Yes, I do think I have an understanding of the process, but it sounds like I'm using incorrect terms.

> >> ... What does sound wrong to you?
> >
> > I had adjusted to using the term "logical data model" for the
> > "implementation model" for a reason -- it took me a bit to accept this
> > revision of the terms, so now I'm reverting back, so that is one thing
> > that "sounds wrong."
>
> Ok.
>
> > The other is what I have heard from you and
> > others before, where there is some huge reservation in discussing
> > storage/persistence/saving data to a secondary storage device for
> > retrieval at a later time in the same sentence as mentioning a logical
> > data model. Maybe these issues are tied together and a simple return
> > to my prior understanding of a logical data model, where it is
> > independent of any specific tools (such as an SQL-DBMS) would clear it
> > up.
>
> They are as tied to each other as steering a boat and Archimedes' law.
> In general, you don't mix in the floating into the steering - you take
> it as a given thing unless you steer a submarine.

I suspect I might be doing that ;-)

> > In your def of a logical data model, are you careful not to bring in
> > implementation-specific matters such as the way any target environment
> > might handle nulls, for example?
>
> I don't like the phrasing of the question.

Well, now I have a question you understand at least, but you don't like it. sigh

> It is not my 'def of a logical data model' (I took that as
> 'when establishing a logical model'). But yes, I take care avoiding
> implementation issues in order to focus on the logic of the
> (user-)data itself.

Good.

> >>>> ... The logical model does not differ. The implementation does.
> >>> Yes, I think many writers use the term "logical data model" or LDM to
> >>> refer to the model that is specified as schema, but I am fine with
> >>> referring to it as the "implementation data model" if you prefer.
> >> Rel ...
> >
> > ?
> >
> ...uctant.

lol

> [snip Pizza]
> >> ... I think there are two basic approaches to conceptual and logical
> >> modeling: /thing-/ and /fact-/ thinking. ERM is closer to
> >> /thing/-think, ORM is closer to /fact/-think (and more detailed).
> >> Though I personally (non-rationally, it just feels closer to how I
> >> think I think) prefer thinking about facts,
> >> I have seen some quite good systems around ERM (things).
> >
> > I like the ORM approach, but am not well-versed in it. ERM (using
> > ERD's) and UML are both comfortable to me for conceptual modeling,
> > preferably on the back of napkins or on a white board and definitely in
> > combination with a glossary.
>
> Just work with /real/ examples (user-validated sentences) instead of
> abstract things, and you will immediately reap some benefits without
> the need to deeply study ORM.

Yes, and I do like working with user-validated sentences.

> Now if you bump against specific
> modeling difficulties using that approach search for that
> problem on the ORM sites - or even ask here; I think Hugo is
> still lurking here :-)

Yes, I prepared a small ORM diagram to test it out and it seemed too complex for sharing with a user compared to an ERD (or simple UML for that matter). Users like to see properties grouped with entities just as I do. OK, OK, propositions -- I mean that users like to see many of the nouns from some of the sentence collected together.

> [snip more Pizza]
>
> >>> ... And with XML, it is likely that the toppings would
> >>> be taken as elements under the pizza, rather than having
> >>> a separate PizzaToppings table.
>
> >> Yep. About query-bias: Now we want to actualize our ingredient storage
> >> based on the sales.
> >
> > Not a problem, but here's how I justify that -- I have been in mgmt
> > positions where I requested reports from different database
> > environments. Pretty much across the board when I reqeusted
> > information and the data came from a non-SQL-DBMS, I got my reports
> > faster, and they were accurate when I got them. Maybe a student
> > employee wrote a calculated field (derived data, computed column,
> > virtual attribute, user-defined field) to get it out for me, but as a
> > mgmr, I got what I requested in a timely fashion. This is a gross
> > over-generalization, but I'll still say it -- reports from SQL-DBMS's
> > were more likely to be incorrect (the person using a reporting tool or
> > writing code messed up related to joins or nulls or whatever selection
> > criteria with their first shot at the report). They also seemed to
> > take more people hours to get them from requirements to production
> > software.
>
> You told this (something like it, 'Bang for the buck') before.
> Like you, I would like to see some research on this -
> your anecdotal evidence is different from my experience.

Yes, I understand. I would like to see some research on this too. I would even like to do some research on this. Trying to come up with an experiment where the conclusions would conclude anything that would be accepted by > 75% of those reading the report did entertain me for a bit, but I didn't come up with something I could pull off without significant dollars.

While I think that my own experience is likely indicative of something, there are tons of factors in any annecdote, so I would prefer to sink my teeth into solid emperical data as well. Without that, I'll rely on my intuition, but you certainly don't have to.

> >>>>>> The next (XML, but it doesn't matter)
> >>>>>> document created by dumping a database
> >>>>> This would not be a database dump,
> >>>> Yes it would, in theory - it is one way to discuss the differences.
> >>>>
> >>>>> but the creation of an application
> >>>>> using XML documents (yes, I realize this isn't a highly scalable
> >>>>> approach to building an app, so we can assume the data volume will be
> >>>>> small if that helps set the scene -- perhaps these are pizza orders for
> >>>>> a school sale)
> >>>>>
> >>>>>> may differ from the previous, even if the (SQL, but it doesn't matter)
> >>>>>> database content stayed the same.
> >>>>>> Arbitrary order would have to be added to prevent this.
> >>>> This is the point of the database dump approach - I
> >>>> wasn't suggesting to actually start with a dump.
> >>>> Do you see why this is problematic?
> >> Please comment on this.
> >
> > When you ask "Do you see why this is problematic" I'm not sure what the
> > "this" is, which is why I skipped by it. Is it the arbitrary order
> > that you find problematic? The only way that data can be communicate
> > is in an order, whether specified or arbitrary. I'm pretty sure I'm
> > missing your question here, sorry -- please restate it.
>
> Not really a question.
> In SQL, order is supposed not to carry meaning by itself.

Yes, unlike sentences.

> If some order has a meaning, it has to be made explicit, e.g. by
> using a rank attribute. A presented set can have a differently
> ordered second presentation, without having a different set.
> In documents, if the order changes, you have another document.

Agreed.

> [snip even more Pizza]
> >>> OK, but it is also hard for you to get your constraint handling all the
> >>> way through to the end-users to maintain, right ;-)
> >> ? I don't get this, sorry.
> >
> > I want to push as many constraints as feasible into the data, so it can
> > be maintained either by configuration of implementors or by end-users.
> > If constraints are coded to SQL, they are not typically available for
> > end-user modification.
>
> Indeed, typically not, even.
> That is not to say it is technically difficult.
> I've seen examples where some constraints need only be
> enforced (scheduled or even user-triggered)
> just a few days before and until reporting day.
> Violations cause great panic.
>
> [snip]
>
> > OK, I think if we can have common defs for logical and implementation
> > models, that will help me rephrase in those terms. Cheers! --dawn
>
> You seem to focused on definitions and terminology at the moment.

Only because I need to rephrase the question and am apparently using the term Logical Data Model incorrectly, yet I'm not certain whether if you and I are both given the same conceptual data model and you are implementing it in Oracle and I in UniData, whether we might have the same logical data model, although different implementation data models, or whether our logical data models would differ. Mine would include multi-valued attributes, for example. Thanks for any clarification. --dawn Received on Thu Oct 26 2006 - 22:18:04 CEST

Original text of this message