Re: Modeling Data for XML instead of SQL-DBMS

From: dawn <dawnwolthuis_at_gmail.com>
Date: 29 Oct 2006 07:38:09 -0800
Message-ID: <1162136289.050440.117940_at_e3g2000cwe.googlegroups.com>


mAsterdam wrote:
> dawn wrote:

<snip>
> Imagine discussing a weekly report:
> Q: How many candybars did we sell?
> A1: It is under sales, candybars: 504
> A2: It is on page 4, line 6: 504
>
> Both feel like navigating and pointing,
> both give the answer, but there is a difference.
> A1 is valid whatever media we are using.
> Not A2. A2 is implementation dependent.
> Not hardware dependent, but implementation
> dependent nevertheless.
>
> The hardware answer A3 would be: At .254
> millimeters from the start of the report, 5 cm
> from the top, 12 to the right on the flip side
> there are some ink-spots in the shape of 504.
> But first the question would require some serious
> translation.
>
> Good to have A3 gone, no?
> Now lets get back to getting A2 out.

OK, I understand this question. I will take what you have written (whether I snipped it or not for this response) and what David responded and use that to do some more research, thinking, and then writing. I'm on the road for the coming week, and have appreciated the dialog as it will help me figure out what I'm trying to ask and say in a language that works for people starting in a different place than I.

There is more than one way to handle the data aspect of a data processing system. These differences show up in what I think can accurately be described as the data model. One of these data models is presented to college students, while many of the others are only presented negatively, giving graduates no idea how best to design data for anything other than the one that is taught. Because we are heading out of the era of 1NF as previously defined and no new languages seem to be implementing the 3VL and NULL-handling approach of SQL, I think we are moving into an era where we are freed from these "features" and it will be more clear to more people that we must teach more variety on data modeling for persistence with a variety of dbms tools.

Since the new approach is not new, but has been in play since the beginning of databases, there are surely some best practices that we, as an industry, has learned about modeling data for such persistence. I think we should gather and teach such practices.

> > where a URL
>
> Uniform Resource Locator
> (http://www.officeport.com/wwwintro/urldefined.htm)
>
> > does not -- it is just the ID value for that node.
>
> A node in the web of documents, not in a structure of user data.
> It is similar to the page/line answer.
> Not to the sales/candybars answer.
<snip>
> </With Interjections>
>
> You appear reluctant to accept:

Yes, but I'll ponder it a bit.

> >> The links are not (logical/user/real) data.
> >> They are not. They are really really really not data.
> >> They do not even reference data. They are locators,
> >> pointing to a location, also called pointers.
> >> Because they are not data, they are not part
> >> of the whole of the data.
>
> You spent a lot of lines against this.
> My guess is that something in your argumentation
> depends (or seems to depend) on links being part
> of the (logical/user/real) data. If so, what is it?
> Maybe that will give other clues.
>

<snip>
> > If we were to create a data repository using XML documents for a
> > book-author system, we could put books in one or more XML documents and
> > authors similarly. When abstracting it to the data model, I would
> > include these two top-level "entities" (they each get a UML rectangle
> > on a class diagram, for example). The name space could be seen as a
> > top level to the metadata, but I don't think that is the hierarchy you
> > are talking about. I do not have to decide whether Books are higher or
> > lower than Authors in any hierarchy, nor does the Book data need to
> > have any data "above it" in some hierarchy (even if the documents have
> > root nodes).
>
> One inconvenience of this implementation
> is the need to keep Authors.Books in sync with Books.Authors.

Yes, that is an inconvenience. There might be better approaches with various type of indexes, so the dbms, rather than a layer on top of it, keeps this in synch.

Thanks for the dialog and sorry to cut it off as I head out for a week, but I do have enough to chew on. Cheers! --dawn Received on Sun Oct 29 2006 - 16:38:09 CET

Original text of this message