Re: Modeling Data for XML instead of SQL-DBMS
Date: Thu, 26 Oct 2006 01:31:37 +0200
Message-ID: <453ff33c$0$330$e4fe514c_at_news.xs4all.nl>
dawn wrote:
...
> The interface to a persistence mechanism (software) is relevant to the
> design of a data model. When I use the term "logical data model" am
> using it to refer to the implementation data model.
Why? The way I learned it (early 80's) is this: the logical model is the most complete, detailed level you can get to /without/ specifying the implementation plan. I don't think I should unlearn that.
At that time the idea was to completely design the logic
before even thinking about implementation - nobody really did it that
way, but it was the accepted 'how it /should/ be done'. Later
method(ologie)s addressed more technical issues up front, in the form
of architecture templates. That is the good and natural
evolution - theory closer to practice - but it does not give an
excuse to blur the distinction logical versus implementation.
> If you want to have "conceptual data model" then
The further away from cut-over-day, the more freedom there is
in deciding what deliverables are really necessary - it also
depends on culture, the size and nature of the project, the user base,
the project team size, management style.
Any one-set-of-deliverables-fits-all-projects theory is
doomed to stay just that: theory.
> It is common to let "logical
> "logical data model" and then another
> "implementation data model" so be it.
> data model" refer to this implementation data model -- the model of the
> data as specified to the API used for retaining data beyond the
> run-time of a particular software application, for example.
In which circles? Can you provide a reference?
>> Surely there is no need for persistence when we have no data to >> persist,
>
> Correct. I am referring to data that is persisted (stored, if
> preferred).
>
>> and from the other perspective: our data is valuable so >> we need persistence for it - but that's about it; mixing the two >> topics frustrates both discussions.
>
> The requirement to retain data beyond a particular application run-time
> is a requirement that mixes the two, it seems.
I don't think so. I think it is where /sharing/ starts - as soon as the next run-time incarnation may differ from an earlier one.
> But if it helps, I will
> state that I do not need to care about any physical storage device,
> only the interface/language/api used to pass data that is to be stored
> and retrive it.
Ok, that does narrow it down and makes more sense to me.
Though I have seen quite a lot of
those interface/language/api -s I have yet to see a foundational
approach to them outside the relational school, the Codd/Date folks.
>> You say "Sure, we could assume that if it helps.", >> what do you mean by that - what are, to you, the >> consequences of assuming no need to share data?
>
> The Codd folks get all hung up on the "large shared data bank" concept,
> so I wanted to avoid those terms if they are a stumbling block. Let's
> assume it need not be large. Further, let's assume that it must be
> shared by different software components, written in different computer
> languages, and that all of these are under our control to build,
> maintain, or license, whichever the case may be for any particular
> component.
One approach I've seen is 'IO-modules' (about 100 man years early 80's, COBOL and Assembler, ISAM for storage): one team building only IO-stuff, no IO is done without the use of their modules. I am positive they 'd love to have had a DBMS. They had a detailed logical model for reference. Actually the system has since been replaced by one built, with a case-tool, around an SQL-database
- one key starting point was the detailed logical model of the old system.
>>>>> in XML >>>> Ah, we are talking about documents, not about data. >>> Yes, data, but data stored in documents, aka files, not employing a >>> DBMS. >> I'll sing it: >> Storage is irrelevant when talking about the logical data model.
>
> Again, the requirement is that the data be available outside of one
> particular software component run-time, so you may call that something
> other than storage if you like. Logical data models are surely
> relevant throughout the software development process, which is why we
> have design patterns and best practices for data models within OOP, for
> example. Logical data models are also relevant when working with the
> interface between any particular software component and the inteface to
> whatever software component will handle saving and retriving such data.
>
>> A DBMS is irrelevant when talking about the logical data model. >> Files are irrelevant when talking about the logical data model. >> Documents are irrelevant when talking about the logical data model. >> Why? >> Talking about the logical data model means talking about the >> logic of the data /itself/ , not about how, where and when >> it is stored, read or processed.
>
> This might be a terminology issue. See above. The conceptual model is
> about the data itself, and if you want the logical data model to also
> be, then this would be the next one in line -- the implementation data
> model used when working between the software with data to pass to a
> software component that will "save" and "retrieve" the data, perhaps
> for some other software written in some other language.
Ok. I think I commented on that, earlier. Accepting the 'change of terminology' - could you attempt a rephrase?
>>>>> documents and not in an SQL-DBMS, the tools would not require that the >>>>> data model be in 1NF or the use of the SQL NULL. >>>> /data model/ >>>> ?? document model! >>> No, it is the data model that is of interest to me, >> Why, then, bring the storage (files, persistence) and other >> containers (XML, SQL-DBMS) into the scope?
>
> Because it is relevant.
To the implementation. But relevant, definitely. (Just to remind you that I do /not/ accept the terminology of the original question :-)
>> How can that possibly help?
> The implementation data model for data that a software component passes
That implementation is relevant is no reason for mixing
it with logical model issues. You say you accept the terminology
> to an SQL-DBMS is often quite different from the implementation data
> model for that same conceptual data model when software other than a
> SQL-DBMS is used to store and retrieve said data.
>>> while the use of >>> the XML document for storage happens to be an implementation where >>> perhaps I can get the question across. >> What question? >> I'm trying to establish whether there really is one.
>
> There is one in my mess of gray matter, but I have not successfully
> phrased the question yet in 3 years time, so thanks for your patience
> and help.
No problem. Maybe this helps, maybe not - I hope it does:
I really think you have a valid question there somewhere
- A very unlucky combination of bad experience with
big RDBMS-projects and unnecessary antagonizing
(especially against relational zealots) choices of words
and approaches does not help you to get it out.
>> Up to now it sounds somewhat like "how do the black keys sound
>> different on a trumpet?".
>
> If a company selects MySQL as a tool for interfacing with a database,
> putting and getting data from a secondary storage device, or a
> not-primarily-a-SQL-DBMS such as Berkeley-DB or Cache' as a DBMS tool,
> they are unlikely to have (almost certain not to have) the same
> implementation (aka logical) data model even starting with the same
> conceptual data model.
Reluctant ...
> Industry best practices, as well as mathematics, have been identified
> related to MySQL (even if it falls short of being a pure implementation
> fo the relational model). So much work has been done with persistence
> outside of any SQL-DBMS toolset that there surely must be industry best
> practices for how to design/model such data repositories. Where there
> is some theory out there for various di-graph models, for example, I
> have not found anything resembling "best practices" or "design
> patterns" in this area.
Neither did I.
>>> My interest here is to figure >>> out what information is out there in the way of best practices for data >>> modeling outside of the SQL-DBMS environment. >> There is no data model /in/side the SQL-DBMS environment >> (except of course the model of such environments itself, the catalog).
>
> Yes, perhaps it helps to think of this as a question about what the
> best practices are in designing the catalog or schema for a
> non-SQL-DBMS, one that does not enforce the (now apparently
> old-fashioned) notion of 1NF nor employ a 3VL, for example.
> ...
>>>>> How would an excellent logical data model designed for this XML >>>> /logical data model for XML/ >>>> No such thing. >>> Perhaps I need different terminology. >> Please try. Using your terms up to now above, does it sound wrong >> to you what I am trying to tell you?
>
> Yes,
What does sound wrong to you?
> but I'm trying to hear it through your ears so I can revise my
> terminology appropriately.
>
>>> There must still be a design for >>> the data in these documents, perhaps specified with a dtd or xsd. I'm >>> pretty sure that some designs are better than others, so what would be >>> a good way to approach the design for this (these) data? >> Just like any other hierarchical implementation: you'll have to >> make decisions on what comes first and how to manage update >> anomalies - but that is /outside/ the logic of the data.
>
> Some designs are more maintainable than others -- those would be ones
> that might fit into a "best practices" category.
>
>>>>> implementation differ from the corresponding data model developed for >>>>> an SQL-DBMS? >>>> /corresponding/ >>>> No real correspondence.
>
> OK, the correspondence is this. I'm sitting here with XML documents
> (or choose a non-SQL-DBMS vehicle for specifying, putting, and getting
> data that must remain after the program writing the data is shut down)
> and you have Oracle in front of you. If we start with the exact same
> conceptual data model (UML, ORM, ERD), we are not going to end up with
> the exact same data model that we implement. We have some guidelines,
> best practices, theory, etc related to Oracle that line up somewhat
> closely with those for DB2, SQL Server, and Sybase. All other tools
> for databases (spoken loosely) are a free-for-all, without the wealth
> of a half-century of best practices (typically found by employing
> practices that are not) put in writing anywhere that I can find.
>
>>> I will try to be more precise, but there will still be room for >>> miscommunication, I'm sure. Given a conceptual data model (not built >>> to take any particular implementation into account), how would an >>> implementation data model (aka logical data model) for XML differ from >>> an implementation data model for a SQL-DBMS? >> The logical model does not differ. The implementation does.
>
> Yes, I think many writers use the term "logical data model" or LDM to
> refer to the model that is specified as schema, but I am fine with
> referring to it as the "implementation data model" if you prefer.
Rel ...
>>> Take a simple conceptual data model for pizza orders, for example ;-) >>> perhaps specified using an old-fashioned ERD or UML (only because I'm a >>> novice with ORM). >> For data modeling UML is a bad fit. Though the class-diagrams have a >> sufficient graphical syntax, it's designed for software design, not >> database design.
>
> I mentioned two other possibilities, so hopefully one meets with your
> approval?
Heh. I'll grab this opportunity to ramble. The graphical syntaxes, as long as they are rich enough, do not really matter.
I think there are two basic approaches to conceptual and logical modeling: /thing-/ and /fact-/ thinking. ERM is closer to /thing/-think, ORM is closer to /fact/-think (and more detailed). Though I personally (non-rationally, it just feels closer to how I think I think) prefer thinking about facts, I have seen some quite good systems around ERM (things).
>>> Pizza(pizzaId, pizzaName, toppings*) >>> Customer(customerId, name, phone) >>> Order(orderId, customerId, pizzaId, addToppings*, removeToppings*) >>> >>> * zero to many values possible (maybe 1 to many in the first case, but >>> simplifying here) >>> Simplying in several ways, including not permitting 1/2 pizza toppings >>> for our purposes here >>> >>> Would there be a difference in the data model if such data were >>> designed for storage in XML documents compared to if designed for >>> storage using an SQL-DBMS? >> Well, the toppings, as we discussed previously, may have a meaningful >> (by itself) order, so you'd have to take special measures when >> implementing in SQL. >> With all the other stuff the order by itself does not have a meaning, >> so you'd have to take special measures for all of that when >> implementing in XML.
>
> And with XML, it is likely that the toppings would be taken as elements
> under the pizza, rather than having a separate PizzaToppings table.
Yep. About query-bias: Now we want to actualize our ingredient storage based on the sales.
>>>> The next (XML, but it doesn't matter) >>>> document created by dumping a database >>> This would not be a database dump, >> Yes it would, in theory - it is one way to discuss the differences. >> >>> but the creation of an application >>> using XML documents (yes, I realize this isn't a highly scalable >>> approach to building an app, so we can assume the data volume will be >>> small if that helps set the scene -- perhaps these are pizza orders for >>> a school sale) >>> >>>> may differ from the previous, even if the (SQL, but it doesn't matter) >>>> database content stayed the same. >>>> Arbitrary order would have to be added to prevent this. >> This is the point of the database dump approach - I >> wasn't suggesting to actually start with a dump. >> Do you see why this is problematic?
Please comment on this.
>>>>> What would be some best practices for modeling data in >>>>> this environment? >>>> /this environment/ >>>> A marketing environment? Fantasy island? >>> Yes, mAsterdam, let's say that it is a pizza sale for a school on >>> Fantasy Island. >> Let's eat pizza, then, and not discuss pyrotechnics.
>
> lol
>
>>>> </Annotations> >>>> >>>>> I'm guessing some will think that the exact same logical data model >>>>> would be appropriate for both targets, but hopefully many will agree >>>>> that it is unlikely that the best implemented data model would be >>>>> identical in each environment. In that case, what would the >>>>> differences be? What best practices would apply to data modeling for >>>>> XML documents compared to data modeling for a SQL-DBMS? >>>> Can't really answer that except "You don't". >>>> The question by it self shows to much wrongs. >>> There is a question here that is legitimate, >> Ok, then please try to rephrase - from your blogs >> and previous discussions I know you are able to reposition >> (though I have to admit that some things are really tough to get >> through to you: constraint handling, for instance).
>
> OK, but it is also hard for you to get your constraint handling all the
> way through to the end-users to maintain, right ;-)
? I don't get this, sorry.
>>> even if I have not yet hit >>> the nail on the head. I've tried to ask this question a number of >>> times in various ways, as you know, but each time the question is >>> considered abberant. If we paint the scenario with what might be >>> termed an "embedded database" that is not an SQL-DBMS, perhaps >>> Berkeley-DB or Cache' or Pick so that we have the same ability to >>> remove 1NF and 3VL-style of NULL-handling, then ask the same question, >>> would it be a legitimate question? If so, then same question. >>> >>> I'm certain that over the past half-century of software development, >>> some data design patterns have shown themselves to be better than >>> others for such target environments. I'm wondering what both theorists >>> and practitioners think would be best practices when data modeling for >>> these environments.
>> Separate the logical model design from the implementation design >> (implementation including SQL-DBMS).
> I have read and worked with these definitions trying to use the most
> common forms, but I am OK with splitting out what I am calling a
> logical data model into an LDM and an implementation data model if that
> clears anything up. Thanks. --dawn
Rephrasing the question does not seem trivially easy or straightforward to me. As you may see I am trying to guess what it is and contribute some to what it might be (and ramble a bit). Received on Thu Oct 26 2006 - 01:31:37 CEST
