Re: Modeling Data for XML instead of SQL-DBMS

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Thu, 26 Oct 2006 01:31:37 +0200
Message-ID: <453ff33c$0$330$e4fe514c_at_news.xs4all.nl>

dawn wrote:
...
> The interface to a persistence mechanism (software) is relevant to the
> design of a data model. When I use the term "logical data model" am
> using it to refer to the implementation data model.

Why? The way I learned it (early 80's) is this: the logical model is the most complete, detailed level you can get to /without/ specifying the implementation plan. I don't think I should unlearn that.

At that time the idea was to completely design the logic before even thinking about implementation - nobody really did it that way, but it was the accepted 'how it /should/ be done'. Later method(ologie)s addressed more technical issues up front, in the form of architecture templates. That is the good and natural evolution - theory closer to practice - but it does not give an excuse to blur the distinction logical versus implementation.

> If you want to have "conceptual data model" then
> "logical data model" and then another
> "implementation data model" so be it.

The further away from cut-over-day, the more freedom there is in deciding what deliverables are really necessary - it also depends on culture, the size and nature of the project, the user base, the project team size, management style. Any one-set-of-deliverables-fits-all-projects theory is doomed to stay just that: theory.

> It is common to let "logical
> data model" refer to this implementation data model -- the model of the
> data as specified to the API used for retaining data beyond the
> run-time of a particular software application, for example.

In which circles? Can you provide a reference?

>> Surely there is no need for persistence when we have no data to
>> persist,

>
> Correct. I am referring to data that is persisted (stored, if
> preferred).
>

>> and from the other perspective: our data is valuable so
>> we need persistence for it - but that's about it; mixing the two
>> topics frustrates both discussions.

>
> The requirement to retain data beyond a particular application run-time
> is a requirement that mixes the two, it seems.

I don't think so. I think it is where /sharing/ starts - as soon as the next run-time incarnation may differ from an earlier one.

> But if it helps, I will
> state that I do not need to care about any physical storage device,
> only the interface/language/api used to pass data that is to be stored
> and retrive it.

Ok, that does narrow it down and makes more sense to me. Though I have seen quite a lot of
those interface/language/api -s I have yet to see a foundational approach to them outside the relational school, the Codd/Date folks.

>> You say "Sure, we could assume that if it helps.",
>> what do you mean by that - what are, to you, the
>> consequences of assuming no need to share data?

>
> The Codd folks get all hung up on the "large shared data bank" concept,
> so I wanted to avoid those terms if they are a stumbling block. Let's
> assume it need not be large. Further, let's assume that it must be
> shared by different software components, written in different computer
> languages, and that all of these are under our control to build,
> maintain, or license, whichever the case may be for any particular
> component.

One approach I've seen is 'IO-modules' (about 100 man years early 80's, COBOL and Assembler, ISAM for storage): one team building only IO-stuff, no IO is done without the use of their modules. I am positive they 'd love to have had a DBMS. They had a detailed logical model for reference. Actually the system has since been replaced by one built, with a case-tool, around an SQL-database

one key starting point was the detailed logical model of the old system.

>>>>> in XML
>>>> Ah, we are talking about documents, not about data.
>>> Yes, data, but data stored in documents, aka files, not employing a
>>> DBMS.
>> I'll sing it:
>> Storage is irrelevant when talking about the logical data model.

>
> Again, the requirement is that the data be available outside of one
> particular software component run-time, so you may call that something
> other than storage if you like. Logical data models are surely
> relevant throughout the software development process, which is why we
> have design patterns and best practices for data models within OOP, for
> example. Logical data models are also relevant when working with the
> interface between any particular software component and the inteface to
> whatever software component will handle saving and retriving such data.
>

>> A DBMS is irrelevant when talking about the logical data model.
>> Files are irrelevant when talking about the logical data model.
>> Documents are irrelevant when talking about the logical data model.
>> Why?
>> Talking about the logical data model means talking about the
>> logic of the data /itself/ , not about how, where and when
>> it is stored, read or processed.

>
> This might be a terminology issue. See above. The conceptual model is
> about the data itself, and if you want the logical data model to also
> be, then this would be the next one in line -- the implementation data
> model used when working between the software with data to pass to a
> software component that will "save" and "retrieve" the data, perhaps
> for some other software written in some other language.

Ok. I think I commented on that, earlier. Accepting the 'change of terminology' - could you attempt a rephrase?

>>>>> documents and not in an SQL-DBMS, the tools would not require that the
>>>>> data model be in 1NF or the use of the SQL NULL.
>>>> /data model/
>>>> ?? document model!
>>> No, it is the data model that is of interest to me,
>> Why, then, bring the storage (files, persistence) and other
>> containers (XML, SQL-DBMS) into the scope?

>
> Because it is relevant.

To the implementation. But relevant, definitely. (Just to remind you that I do /not/ accept the terminology of the original question :-)

>> How can that possibly help?

> The implementation data model for data that a software component passes
> to an SQL-DBMS is often quite different from the implementation data
> model for that same conceptual data model when software other than a
> SQL-DBMS is used to store and retrieve said data.

That implementation is relevant is no reason for mixing it with logical model issues. You say you accept the terminology change, but you seem reluctant to do away with the old one.

>>> while the use of
>>> the XML document for storage happens to be an implementation where
>>> perhaps I can get the question across.
>> What question?
>> I'm trying to establish whether there really is one.

>
> There is one in my mess of gray matter, but I have not successfully
> phrased the question yet in 3 years time, so thanks for your patience
> and help.

No problem. Maybe this helps, maybe not - I hope it does: I really think you have a valid question there somewhere - A very unlucky combination of bad experience with big RDBMS-projects and unnecessary antagonizing (especially against relational zealots) choices of words and approaches does not help you to get it out.

>> Up to now it sounds somewhat like "how do the black keys sound
>> different on a trumpet?".

>
> If a company selects MySQL as a tool for interfacing with a database,
> putting and getting data from a secondary storage device, or a
> not-primarily-a-SQL-DBMS such as Berkeley-DB or Cache' as a DBMS tool,
> they are unlikely to have (almost certain not to have) the same
> implementation (aka logical) data model even starting with the same
> conceptual data model.

Reluctant ...

> Industry best practices, as well as mathematics, have been identified
> related to MySQL (even if it falls short of being a pure implementation
> fo the relational model). So much work has been done with persistence
> outside of any SQL-DBMS toolset that there surely must be industry best
> practices for how to design/model such data repositories. Where there
> is some theory out there for various di-graph models, for example, I
> have not found anything resembling "best practices" or "design
> patterns" in this area.

Neither did I.

>>> My interest here is to figure
>>> out what information is out there in the way of best practices for data
>>> modeling outside of the SQL-DBMS environment.
>> There is no data model /in/side the SQL-DBMS environment
>> (except of course the model of such environments itself, the catalog).

>
> Yes, perhaps it helps to think of this as a question about what the
> best practices are in designing the catalog or schema for a
> non-SQL-DBMS, one that does not enforce the (now apparently
> old-fashioned) notion of 1NF nor employ a 3VL, for example.
> ...

>>>>> How would an excellent logical data model designed for this XML
>>>> /logical data model for XML/
>>>> No such thing.
>>> Perhaps I need different terminology.
>> Please try. Using your terms up to now above, does it sound wrong
>> to you what I am trying to tell you?

>
> Yes,

What does sound wrong to you?

> but I'm trying to hear it through your ears so I can revise my
> terminology appropriately.
>

>>> There must still be a design for
>>> the data in these documents, perhaps specified with a dtd or xsd.  I'm
>>> pretty sure that some designs are better than others, so what would be
>>> a good way to approach the design for this (these) data?
>> Just like any other hierarchical implementation: you'll have to
>> make decisions on what comes first and how to manage update
>> anomalies - but that is /outside/ the logic of the data.

>
> Some designs are more maintainable than others -- those would be ones
> that might fit into a "best practices" category.
>

>>>>> implementation differ from the corresponding data model developed for
>>>>> an SQL-DBMS?
>>>> /corresponding/
>>>> No real correspondence.

>
> OK, the correspondence is this. I'm sitting here with XML documents
> (or choose a non-SQL-DBMS vehicle for specifying, putting, and getting
> data that must remain after the program writing the data is shut down)
> and you have Oracle in front of you. If we start with the exact same
> conceptual data model (UML, ORM, ERD), we are not going to end up with
> the exact same data model that we implement. We have some guidelines,
> best practices, theory, etc related to Oracle that line up somewhat
> closely with those for DB2, SQL Server, and Sybase. All other tools
> for databases (spoken loosely) are a free-for-all, without the wealth
> of a half-century of best practices (typically found by employing
> practices that are not) put in writing anywhere that I can find.
>

>>> I will try to be more precise, but there will still be room for
>>> miscommunication, I'm sure.  Given a conceptual data model (not built
>>> to take any particular implementation into account), how would an
>>> implementation data model (aka logical data model) for XML differ from
>>> an implementation data model for a SQL-DBMS?
>> The logical model does not differ. The implementation does.

>
> Yes, I think many writers use the term "logical data model" or LDM to
> refer to the model that is specified as schema, but I am fine with
> referring to it as the "implementation data model" if you prefer.

Rel ...

>>> Take a simple conceptual data model for pizza orders, for example ;-)
>>> perhaps specified using an old-fashioned ERD or UML (only because I'm a
>>> novice with ORM).
>> For data modeling UML is a bad fit. Though the class-diagrams have a
>> sufficient graphical syntax, it's designed for software design, not
>> database design.

>
> I mentioned two other possibilities, so hopefully one meets with your
> approval?

Heh. I'll grab this opportunity to ramble. The graphical syntaxes, as long as they are rich enough, do not really matter.

I think there are two basic approaches to conceptual and logical modeling: /thing-/ and /fact-/ thinking. ERM is closer to /thing/-think, ORM is closer to /fact/-think (and more detailed). Though I personally (non-rationally, it just feels closer to how I think I think) prefer thinking about facts, I have seen some quite good systems around ERM (things).

>>> Pizza(pizzaId, pizzaName, toppings*)
>>> Customer(customerId, name, phone)
>>> Order(orderId, customerId, pizzaId, addToppings*, removeToppings*)
>>>
>>> * zero to many values possible (maybe 1 to many in the first case, but
>>> simplifying here)
>>> Simplying in several ways, including not permitting 1/2 pizza toppings
>>> for our purposes here
>>>
>>> Would there be a difference in the data model if such data were
>>> designed for storage in XML documents compared to if designed for
>>> storage using an SQL-DBMS?
>> Well, the toppings, as we discussed previously, may have a meaningful
>> (by itself) order, so you'd have to take special measures when
>> implementing in SQL.
>> With all the other stuff the order by itself does not have a meaning,
>> so you'd have to take special measures for all of that when
>> implementing in XML.

>
> And with XML, it is likely that the toppings would be taken as elements
> under the pizza, rather than having a separate PizzaToppings table.

Yep. About query-bias: Now we want to actualize our ingredient storage based on the sales.

>>>> The next (XML, but it doesn't matter)
>>>> document created by dumping a database
>>> This would not be a database dump,
>> Yes it would, in theory - it is one way to discuss the differences.
>>
>>> but the creation of an application
>>> using XML documents (yes, I realize this isn't a highly scalable
>>> approach to building an app, so we can assume the data volume will be
>>> small if that helps set the scene -- perhaps these are pizza orders for
>>> a school sale)
>>>
>>>> may differ from the previous, even if the (SQL, but it doesn't matter)
>>>> database content stayed the same.
>>>> Arbitrary order would have to be added to prevent this.
>> This is the point of the database dump approach - I
>> wasn't suggesting to actually start with a dump.
>> Do you see why this is problematic?

Please comment on this.

>>>>> What would be some best practices for modeling data in
>>>>> this environment?
>>>> /this environment/
>>>> A marketing environment? Fantasy island?
>>> Yes, mAsterdam, let's say that it is a pizza sale for a school on
>>> Fantasy Island.
>> Let's eat pizza, then, and not discuss pyrotechnics.

>
> lol
>

>>>> </Annotations>
>>>>
>>>>> I'm guessing some will think that the exact same logical data model
>>>>> would be appropriate for both targets, but hopefully many will agree
>>>>> that it is unlikely that the best implemented data model would be
>>>>> identical in each environment.  In that case, what would the
>>>>> differences be?  What best practices would apply to data modeling for
>>>>> XML documents compared to data modeling for a SQL-DBMS?
>>>> Can't really answer that except "You don't".
>>>> The question by it self shows to much wrongs.
>>> There is a question here that is legitimate,

>> Ok, then please try to rephrase - from your blogs
>> and previous discussions I know you are able to reposition
>> (though I have to admit that some things are really tough to get
>> through to you: constraint handling, for instance).

>
> OK, but it is also hard for you to get your constraint handling all the
> way through to the end-users to maintain, right ;-)

? I don't get this, sorry.

>>> even if I have not yet hit
>>> the nail on the head.  I've tried to ask this question a number of
>>> times in various ways, as you know, but each time the question is
>>> considered abberant.  If we paint the scenario with what might be
>>> termed an "embedded database" that is not an SQL-DBMS, perhaps
>>> Berkeley-DB or Cache' or Pick so that we have the same ability to
>>> remove 1NF and 3VL-style of NULL-handling, then ask the same question,
>>> would it be a legitimate question?  If so, then same question.
>>>
>>> I'm certain that over the past half-century of software development,
>>> some data design patterns have shown themselves to be better than
>>> others for such target environments.  I'm wondering what both theorists
>>> and practitioners think would be best practices when data modeling for
>>> these environments.

>> Separate the logical model design from the implementation design >> (implementation including SQL-DBMS).

> I have read and worked with these definitions trying to use the most
> common forms, but I am OK with splitting out what I am calling a
> logical data model into an LDM and an implementation data model if that
> clears anything up. Thanks. --dawn

Rephrasing the question does not seem trivially easy or straightforward to me. As you may see I am trying to guess what it is and contribute some to what it might be (and ramble a bit). Received on Thu Oct 26 2006 - 01:31:37 CEST

This message: [ Message body ]
Next message: dawn: "Re: Modeling Data for XML instead of SQL-DBMS"
Previous message: Bob Badour: "Re: Mapping arbitrary number of attributes to DB"
In reply to: dawn: "Re: Modeling Data for XML instead of SQL-DBMS"
In reply to dawn: "Re: Modeling Data for XML instead of SQL-DBMS"
Next in thread: dawn: "Re: Modeling Data for XML instead of SQL-DBMS"
Reply: dawn: "Re: Modeling Data for XML instead of SQL-DBMS"
Reply: J M Davitt: "Re: Modeling Data for XML instead of SQL-DBMS"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message