Re: Modeling Data for XML instead of SQL-DBMS

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Thu, 26 Oct 2006 19:51:51 +0200
Message-ID: <4540f50c$0$329$e4fe514c_at_news.xs4all.nl>


dawn wrote:
> mAsterdam wrote:

>> ... the logical model is the most complete, detailed level
>> you can get to /without/ specifying the implementation plan.
>> I don't think I should unlearn that.

>
> It took me a bit to find this since it was in a paper, rather than a
> book. I previouly defined the logical data model to be independent of
> any particular implementation, but decided to change my own def to
> align with what I read from Pascal.
>
> Pascal, Unmuddling Modeling, Feb 2005 (there might be an updated
> version, but this is the latest one that I have). He says things like
> this form p.12.
>
> "Database representations of the formal versions of business
> rules/conceptual models as mapped via the data model are called logical
> models."

As clear as muddle, this. No need to unlearn.

>> At that time the idea was to completely design the logic
>> before even thinking about implementation - nobody really did it that
>> way, but it was the accepted 'how it /should/ be done'. Later
>> method(ologie)s addressed more technical issues up front, in the form
>> of architecture templates. That is the good and natural
>> evolution - theory closer to practice - but it does not give an
>> excuse to blur the distinction logical versus implementation.
>>
>>> If you want to have "conceptual data model" then
>>> "logical data model" and then another
>>> "implementation data model" so be it.

>> The further away from cut-over-day, the more freedom there is
>> in deciding what deliverables are really necessary - it also
>> depends on culture, the size and nature of the project, the user base,
>> the project team size, management style.
>> Any one-set-of-deliverables-fits-all-projects theory is
>> doomed to stay just that: theory.
>>
>>> It is common to let "logical
>>> data model" refer to this implementation data model -- the model of the
>>> data as specified to the API used for retaining data beyond the >>> run-time of a particular software application, for example.
>> In which circles? Can you provide a reference?

>
> I think it comes from the Date/Darwin/Pascal side of the house, but I
> at this point I'm just looking at Pascal's paper to verify that (so
> Date and Darwin might suggest otherwise).

Well, the way you are using (or, better /were/ using, you promised :-) these terms is counterintuitive to me and inconsistent - I did not see this when reading Date & Darwen. About Pascal, dunno. His b.s. bar theory about v.i. makes me less inclined to reading his other stuff.

>>>> Surely there is no need for persistence when we have no data to
>>>> persist,
>>> Correct.  I am referring to data that is persisted (stored, if
>>> preferred).
>>>
>>>> and from the other perspective: our data is valuable so
>>>> we need persistence for it - but that's about it; mixing the two
>>>> topics frustrates both discussions.
>>> The requirement to retain data beyond a particular application run-time
>>> is a requirement that mixes the two, it seems.
>> I don't think so. I think it is where /sharing/ starts - as soon
>> as the next run-time incarnation may differ from an earlier one.

>
> OK, if that is how you define "sharing" then yes, the data are to be
> shared.

How would you define sharing?

>>> But if it helps, I will
>>> state that I do not need to care about any physical storage device,
>>> only the interface/language/api used to pass data that is to be stored
>>> and retrive it.

>> Ok, that does narrow it down and makes more sense to me.
>> Though I have seen quite a lot of
>> those interface/language/api -s I have yet to see a foundational
>> approach to them outside the relational school, the Codd/Date folks.

>
> I have read some of the papers that Jan pointed me to related to
> functional databases as well as di-graphs. I don't recall taking away
> anything from those in the form of "best practices" in data modeling
> derived from theory, although there might be some. As best I know,
> nothing has made its way to those who are actually designing data
> models for such environments. They typically rely on learning from
> their own mistakes and those of people around them.
>
> I think we could gain a bit by learning more broadly in this area as a
> profession. There are some roadblocks, however, including the fact
> that those doing data modeling for XML or any non-SQL-DBMS
> "persistence" typically know they are not doing what is taught in
> colleges, so they do it in back rooms with the lights dim.

So what you are really asking is "The real book on multi-app hierarchical modeling and extreme gay sex", by several anonymous authors ;-) or ...

> Additionally, most of those who do such data design are also doing OOP
> and other application programming design -- so they hold discussions
> with other app developers, not in "data modeling" or "database"
> circles.

... "The best way to make files without database knowledge." Seriously, I don't think this is the road to improving the question.

>> ... could you attempt a rephrase?

>
> I'm missing what you want rephrased. My question is about the

The original question - or are the above parodies on the mark?

> implementation data model that I was calling the logical data model,
> only because I wanted to align with relational theorists terminology.

You can use it, or not use it, or learn to use it. 'Aligning' suggests you have an alternate theory around concepts which more or less correspond to relational theory. You don't, do you?

> The tools/languages/apis/interfaces used to put and get data from
> application software are relevant to this implementation data model.
>

>>>>>>> documents and not in an SQL-DBMS, the tools would not require that the
>>>>>>> data model be in 1NF or the use of the SQL NULL.
>>>>>> /data model/
>>>>>> ?? document model!
>>>>> No, it is the data model that is of interest to me,
>>>> Why, then, bring the storage (files, persistence) and other
>>>> containers (XML, SQL-DBMS) into the scope?
>>> Because it is relevant.
>> To the implementation. But relevant, definitely.
>> (Just to remind you that I do /not/ accept the terminology of
>> the original question :-)

>
> I tried to at least roughly align my terminology in my The Naked Model
> blog entry with Pascal's from his Unmuddling modeling paper. I can
> adapt to another approach. I have heard some call the implementation
> model the physical model, but that is lower in the defs I'm working
> with.

Say we have a logical model - now we decide to implement using hierarchical tools /without/ specifying which one (IMS, Lotus Domino, XML, just to name a few alternatives) - now what? Which choices do we have to make?

Is that what your question is about?

I could imagine useful treatment of this problem in the abstract, but I am not aware of such treatment.

> I don't think we have cdm, ldm, pdm or implementation data model
> in our glossary, but I'm not looking at it to verify that. Is there

I see no need to include them. The most basic misunderstanding in the OP (as I see it) is specifically about the distinction between logical and implementation-specific, no need to mix in even more types of models.

> someone who has laid out defs that you like so I can start there in
> forming the question?
>

>>>> How can that possibly help?
>>> The implementation data model for data that a software component passes
>>> to an SQL-DBMS is often quite different from the implementation data
>>> model for that same conceptual data model when software other than a
>>> SQL-DBMS is used to store and retrieve said data.
>> That implementation is relevant is no reason for mixing
>> it with logical model issues. You say you accept the terminology
>> change, but you seem reluctant to do away with the old one.

>
> I tried to change to say "implementation data model" instead of
> "logical data model."

I don't think "implementation data model" as a term helps (no objection to casual use, of course). When detailing an implementation, one of the given things is the (logical) data model. Another one is the set of tools you will use while building and during, test, deployment and exploitation.

>> ... What does sound wrong to you?

>
> I had adjusted to using the term "logical data model" for the
> "implementation model" for a reason -- it took me a bit to accept this
> revision of the terms, so now I'm reverting back, so that is one thing
> that "sounds wrong."

Ok.

> The other is what I have heard from you and
> others before, where there is some huge reservation in discussing
> storage/persistence/saving data to a secondary storage device for
> retrieval at a later time in the same sentence as mentioning a logical
> data model. Maybe these issues are tied together and a simple return
> to my prior understanding of a logical data model, where it is
> independent of any specific tools (such as an SQL-DBMS) would clear it
> up.

They are as tied to each other as steering a boat and Archimedes' law. In general, you don't mix in the floating into the steering - you take it as a given thing unless you steer a submarine.

> In your def of a logical data model, are you careful not to bring in
> implementation-specific matters such as the way any target environment
> might handle nulls, for example?

I don't like the phrasing of the question. It is not my 'def of a logical data model' (I took that as 'when establishing a logical model'). But yes, I take care avoiding implementation issues in order to focus on the logic of the (user-)data itself.

>>>> ... The logical model does not differ. The implementation does.
>>> Yes, I think many writers use the term "logical data model" or LDM to
>>> refer to the model that is specified as schema, but I am fine with
>>> referring to it as the "implementation data model" if you prefer.
>> Rel ...

>
> ?
>


...uctant.

[snip Pizza]

>> ... I think there are two basic approaches to conceptual and logical
>> modeling: /thing-/ and /fact-/ thinking. ERM is closer to
>> /thing/-think, ORM is closer to /fact/-think (and more detailed).
>> Though I personally (non-rationally, it just feels closer to how I
>> think I think) prefer thinking about facts,
>> I have seen some quite good systems around ERM (things).

>
> I like the ORM approach, but am not well-versed in it. ERM (using
> ERD's) and UML are both comfortable to me for conceptual modeling,
> preferably on the back of napkins or on a white board and definitely in
> combination with a glossary.

Just work with /real/ examples (user-validated sentences) instead of abstract things, and you will immediately reap some benefits without the need to deeply study ORM. Now if you bump against specific modeling difficulties using that approach search for that problem on the ORM sites - or even ask here; I think Hugo is still lurking here :-)

[snip more Pizza]

>>> ... And with XML, it is likely that the toppings would
>>> be taken as elements under the pizza, rather than having 
>>> a separate PizzaToppings table.

>> Yep. About query-bias: Now we want to actualize our ingredient storage
>> based on the sales.

>
> Not a problem, but here's how I justify that -- I have been in mgmt
> positions where I requested reports from different database
> environments. Pretty much across the board when I reqeusted
> information and the data came from a non-SQL-DBMS, I got my reports
> faster, and they were accurate when I got them. Maybe a student
> employee wrote a calculated field (derived data, computed column,
> virtual attribute, user-defined field) to get it out for me, but as a
> mgmr, I got what I requested in a timely fashion. This is a gross
> over-generalization, but I'll still say it -- reports from SQL-DBMS's
> were more likely to be incorrect (the person using a reporting tool or
> writing code messed up related to joins or nulls or whatever selection
> criteria with their first shot at the report). They also seemed to
> take more people hours to get them from requirements to production
> software.

You told this (something like it, 'Bang for the buck') before. Like you, I would like to see some research on this - your anecdotal evidence is different from my experience.

>>>>>> The next (XML, but it doesn't matter)
>>>>>> document created by dumping a database
>>>>> This would not be a database dump,
>>>> Yes it would, in theory - it is one way to discuss the differences.
>>>>
>>>>> but the creation of an application
>>>>> using XML documents (yes, I realize this isn't a highly scalable
>>>>> approach to building an app, so we can assume the data volume will be
>>>>> small if that helps set the scene -- perhaps these are pizza orders for
>>>>> a school sale)
>>>>>
>>>>>> may differ from the previous, even if the (SQL, but it doesn't matter)
>>>>>> database content stayed the same.
>>>>>> Arbitrary order would have to be added to prevent this.
>>>> This is the point of the database dump approach - I
>>>> wasn't suggesting to actually start with a dump.
>>>> Do you see why this is problematic?
>> Please comment on this.

>
> When you ask "Do you see why this is problematic" I'm not sure what the
> "this" is, which is why I skipped by it. Is it the arbitrary order
> that you find problematic? The only way that data can be communicate
> is in an order, whether specified or arbitrary. I'm pretty sure I'm
> missing your question here, sorry -- please restate it.

Not really a question.
In SQL, order is supposed not to carry meaning by itself. If some order has a meaning, it has to be made explicit, e.g. by using a rank attribute. A presented set can have a differently ordered second presentation, without having a different set. In documents, if the order changes, you have another document.

[snip even more Pizza]

>>> OK, but it is also hard for you to get your constraint handling all the
>>> way through to the end-users to maintain, right ;-)
>> ? I don't get this, sorry.

>
> I want to push as many constraints as feasible into the data, so it can
> be maintained either by configuration of implementors or by end-users.
> If constraints are coded to SQL, they are not typically available for
> end-user modification.

Indeed, typically not, even.
That is not to say it is technically difficult. I've seen examples where some constraints need only be enforced (scheduled or even user-triggered) just a few days before and until reporting day. Violations cause great panic.

[snip]

> OK, I think if we can have common defs for logical and implementation
> models, that will help me rephrase in those terms. Cheers! --dawn

You seem to focused on definitions and terminology at the moment.

I'll repeat something I wrote above (please snip it there or here):

Say we have a logical model - now we decide to implement using hierarchical tools /without/ specifying which one (IMS, Lotus Domino, XML, just to name a few alternatives) - now what? Which choices do we have to make?

Is that what your question is about?

I could imagine useful treatment of this problem in the abstract, but I am not aware of such treatment. (I also have a feeling of why this is so - but I'd like to know if I'm looking in the right direction.) Received on Thu Oct 26 2006 - 19:51:51 CEST

Original text of this message