Re: Modeling Data for XML instead of SQL-DBMS

From: dawn <dawnwolthuis_at_gmail.com>
Date: 26 Oct 2006 10:15:18 -0700
Message-ID: <1161882918.660942.133320_at_m7g2000cwm.googlegroups.com>


Volker Hetzer wrote:
> dawn schrieb:
> > Volker Hetzer wrote:
> >> dawn schrieb:
> >>> If working on a software project where all data are persisted in XML
> >>> documents and not in an SQL-DBMS, the tools would not require that the
> >>> data model be in 1NF or the use of the SQL NULL.
> >>>
> >>> How would an excellent logical data model designed for this XML
> >>> implementation differ from the corresponding data model developed for
> >>> an SQL-DBMS? What would be some best practices for modeling data in
> >>> this environment?
> >>>
> >>> I'm guessing some will think that the exact same logical data model
> >>> would be appropriate for both targets, but hopefully many will agree
> >>> that it is unlikely that the best implemented data model would be
> >>> identical in each environment. In that case, what would the
> >>> differences be? What best practices would apply to data modeling for
> >>> XML documents compared to data modeling for a SQL-DBMS?
> >> Depends on what came first. If you've got a nice XML model, you can probably
> >> model it straight a way as a bunch of one to many or whatever relations in the
> >> database.
> >
> > Using a broad definition of "database" the XML documents, in this case,
> > are the database.
> When I'm talking about the database I'm talking about the type of database
> generally accepted as "database" in this group. I suggest you adapt to the
> terminology in this group too, for discussions within this group.

I try very hard to do that, Volker. The definition in the cdt glossary is

<glossaryEntry>
[Database]
  "A logically coherent collection of related real-world data   assembled for a specific purpose." -- rephrased from "Fundamentals of Database Systems", Elmasri & Navathe.

  1. Deluxe file system
  2. Shared databank (E. Codd)

</glossaryEntry>

The definition in quotations above is the preferred def for this ng with the other two listed as alternatives. I am trying to be sure to align with that definition. With the Codd definition, a clarification is required for the meaning of "shared" as discussed with David, so that I am talking about a databank shared with different languages and products, but all under the same "control" structures, which would be the difference between the databases I'm talking about and those often put under the head of "relational databases."

> > They are going to come first and there is no
> > SQL-DBMS as a target. So, the question is what theory and practical
> > tips are there from the past 50 years of computing to help build a good
> > data model for this XML "database' or data repository, if you prefer.
> No.
> There are three ways, depending on what you want to do:
> 1) Understand the XML contents and do a database model from scratch as if
> there was no XML

That is OK with me as long as I am also doing the database model from scratch as if there were no SQL. If I have no need to bring in 1NF (by traditional meanings) structures into my design, I would not want to design a data model with that approach and then have to convert, although that would be feasible.

> 2) Translate into a simple tree like ERD

Yes, we can assume an ERD exists in this case (conceptual data model)

> 3) store it as native XML datatype in any modern database.

My real issue is not in how to persist XML documents, but in how to model data for persistence using both old and new non-SQL-DBMS tools (that do not limit the designer to 1NF, nor introduce the SQL NULL for use with a 3VL, for example).

>

> > Yes, that is the thought process I want to tap into. What are the
> > industry best practices for designing data for a non-1NF repository
> > such as these XML documents?
> The industry best practice is to to one of the three things above.
> Point one usually implies at least N3.

Yes and 3NF is defined to require 1NF (which has been redefined over the years). I want to eliminate 1NF, while still concerning myself with issues of functional dependencies.

> >> Most modern databases can store XML as it is, either as blob or in separate
> >> tables.
> >
> > Think of the XML docs as a collection that constitutes a base of data
> > for this app.
> This doesn't change that "most modern databases..." see the bit of mine you
> quoted.

>

> >
> >> If all you're concerned is the amount of XML data but would like
> >> to stay as XML'ish as possible (i.e. using xpath/xquery only not on the doc
> >> but on the database), that's the way I'd go.
> >
> > I'm looking for the theory and practice that would yield the best
> > results for the design of this data repository if it is exclusively in
> > XML documents, with web services taking it anywhere else it needs to
> > be, but with no requirement to ever be persisted by way of a SQL-DBMS,
> > for example. I hope that makes some sense.
> Not to me, sorry.

Perhaps the XML document distract you from underlying question I have. There are many terabytes of data in databases from software applications where a "standard" SQL-DBMS is not the software component used between the application a database. These are either old or new models as this approach pre-dates the relational model and is also at least one possibility for the future of software development.

I'm looking for industry best practices for such. Two primary differentiators that I am using to test whether something is an SQL-DBMS or other (since most tools permit SQL queries) are 1) whether it is typical for developers to work with data in the form (formerly known as) 1NF or not and 2) whether nulls are logically values (such as empty set) or represent the lack of a value. If the DBMS has both a value-less NULL and an empty set "null" (e.g. empty string) then they can still fall in this category. If a DBMS meets both the non-1NF and 2VL tests, then it is in my not-SQL-DBMS category even if it has an SQL implementation. I suspect we will get a name for these that is better than non-SQL-DBMS in the future, other than the typical marketing spin of today where these are typically referred to as "embedded databases" (even when employed for ERP solutions).

> If there is no need to have the data persist,

I'm definitely referring to data that must be persisted.

> theory and practice yields
> the suggestion not to put it into a database at all. That is so, because
> usually the effort to load and unload it outweigh any benefits from faster
> querying for example.

Yup, there are tradeoffs between temporary workfiles on SSDs vs memory. Thanks. --dawn

> Lots of Greetings!
> Volker
> --
> For email replies, please substitute the obvious.
Received on Thu Oct 26 2006 - 19:15:18 CEST

Original text of this message