Re: Modeling Data for XML instead of SQL-DBMS

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Wed, 25 Oct 2006 20:52:04 +0200
Message-ID: <453fb1a1$0$324$e4fe514c_at_news.xs4all.nl>


dawn wrote:
> mAsterdam wrote:

>> <Annotations>
>>
>> dawn wrote:
>>> If working on a software project where all data are persisted
>> /persisted/
>>
>> Ah, we are talking software development on an island, not
>> about shared data.

>
> Sure, we could assume that if it helps.

I am not sure you are aware of the implications.

There are layers of abstraction here. Talking about persistence as such is possible when we abstract away from the specifics of what it is that is persisted.
Talking about data as such is possible when we abstract away from the specifics of how it is
persisted.

Surely there is no need for persistence when we have no data to persist, and from the other perspective: our data is valuable so we need persistence for it - but that's about it; mixing the two topics frustrates both discussions.

You say "Sure, we could assume that if it helps.", what do you mean by that - what are, to you, the consequences of assuming no need to share data?

>>> in XML
>> Ah, we are talking about documents, not about data.

>
> Yes, data, but data stored in documents, aka files, not employing a
> DBMS.
I'll sing it:
Storage is irrelevant when talking about the logical data model. A DBMS is irrelevant when talking about the logical data model. Files are irrelevant when talking about the logical data model. Documents are irrelevant when talking about the logical data model. Why?
Talking about the logical data model means talking about the
logic of the data /itself/ , not about how, where and when it is stored, read or processed.
>>> documents and not in an SQL-DBMS, the tools would not require that the
>>> data model be in 1NF or the use of the SQL NULL.
>> /data model/
>> ?? document model!

>

> No, it is the data model that is of interest to me,

Why, then, bring the storage (files, persistence) and other containers (XML, SQL-DBMS) into the scope? How can that possibly help?

> while the use of
> the XML document for storage happens to be an implementation where
> perhaps I can get the question across.

What question?
I'm trying to establish whether there really is one. Up to now it sounds somewhat like "how do the black keys sound different on a trumpet?".

> My interest here is to figure
> out what information is out there in the way of best practices for data
> modeling outside of the SQL-DBMS environment.

There is no data model /in/side the SQL-DBMS environment (except of course the model of such environments itself, the catalog).

>> - ok, no 1NF.
>> /NULL/ (sigh of relief) - not about NULL.
>>
>>> How would an excellent logical data model designed for this XML
>> /logical data model for XML/
>> No such thing.

>

> Perhaps I need different terminology.

Please try. Using your terms up to now above, does it sound wrong to you what I am trying to tell you?

> There must still be a design for
> the data in these documents, perhaps specified with a dtd or xsd. I'm
> pretty sure that some designs are better than others, so what would be
> a good way to approach the design for this (these) data?

Just like any other hierarchical implementation: you'll have to make decisions on what comes first and how to manage update anomalies - but that is /outside/ the logic of the data.

>>> implementation differ from the corresponding data model developed for
>>> an SQL-DBMS?
>> /corresponding/
>> No real correspondence.

>
> I will try to be more precise, but there will still be room for
> miscommunication, I'm sure. Given a conceptual data model (not built
> to take any particular implementation into account), how would an
> implementation data model (aka logical data model) for XML differ from
> an implementation data model for a SQL-DBMS?

The logical model does not differ. The implementation does.

> Take a simple conceptual data model for pizza orders, for example ;-)
> perhaps specified using an old-fashioned ERD or UML (only because I'm a
> novice with ORM).

For data modeling UML is a bad fit. Though the class-diagrams have a sufficient graphical syntax, it's designed for software design, not database design.

> Pizza(pizzaId, pizzaName, toppings*)
> Customer(customerId, name, phone)
> Order(orderId, customerId, pizzaId, addToppings*, removeToppings*)
>
> * zero to many values possible (maybe 1 to many in the first case, but
> simplifying here)
> Simplying in several ways, including not permitting 1/2 pizza toppings
> for our purposes here
>
> Would there be a difference in the data model if such data were
> designed for storage in XML documents compared to if designed for
> storage using an SQL-DBMS?

Well, the toppings, as we discussed previously, may have a meaningful (by itself) order, so you'd have to take special measures when implementing in SQL.
With all the other stuff the order by itself does not have a meaning, so you'd have to take special measures for all of that when
implementing in XML.

>> The next (XML, but it doesn't matter)
>> document created by dumping a database

>

> This would not be a database dump,

Yes it would, in theory - it is one way to discuss the differences.

> but the creation of an application
> using XML documents (yes, I realize this isn't a highly scalable
> approach to building an app, so we can assume the data volume will be
> small if that helps set the scene -- perhaps these are pizza orders for
> a school sale)
>

>> may differ from the previous, even if the (SQL, but it doesn't matter)
>> database content stayed the same.
>> Arbitrary order would have to be added to prevent this.

This is the point of the database dump approach - I wasn't suggesting to actually start with a dump. Do you see why this is problematic?

>>> What would be some best practices for modeling data in
>>> this environment?
>> /this environment/
>> A marketing environment? Fantasy island?

>
> Yes, mAsterdam, let's say that it is a pizza sale for a school on
> Fantasy Island.

Let's eat pizza, then, and not discuss pyrotechnics.

>> </Annotations>
>>
>>> I'm guessing some will think that the exact same logical data model
>>> would be appropriate for both targets, but hopefully many will agree
>>> that it is unlikely that the best implemented data model would be
>>> identical in each environment.  In that case, what would the
>>> differences be?  What best practices would apply to data modeling for
>>> XML documents compared to data modeling for a SQL-DBMS?
>> Can't really answer that except "You don't".
>> The question by it self shows to much wrongs.

>
> There is a question here that is legitimate,

Ok, then please try to rephrase - from your blogs and previous discussions I know you are able to reposition (though I have to admit that some things are really tough to get through to you: constraint handling, for instance).

> even if I have not yet hit
> the nail on the head. I've tried to ask this question a number of
> times in various ways, as you know, but each time the question is
> considered abberant. If we paint the scenario with what might be
> termed an "embedded database" that is not an SQL-DBMS, perhaps
> Berkeley-DB or Cache' or Pick so that we have the same ability to
> remove 1NF and 3VL-style of NULL-handling, then ask the same question,
> would it be a legitimate question? If so, then same question.
>
> I'm certain that over the past half-century of software development,
> some data design patterns have shown themselves to be better than
> others for such target environments. I'm wondering what both theorists
> and practitioners think would be best practices when data modeling for
> these environments.

Separate the logical model design from the implementation design (implementation including SQL-DBMS). Received on Wed Oct 25 2006 - 20:52:04 CEST

Original text of this message