Re: native xml processing vs what Postgres and Oracle offer

From: rpost <rpost_at_pcwin518.campus.tue.nl>
Date: Wed, 7 Jan 2009 18:40:41 +0000 (UTC)
Message-ID: <gk2sv9$1blf$1_at_mud.stack.nl>


salmobytes wrote:

[...]

>Hierarchies are part of the real world. They just don't fit well into
>the relational scheme of things.

This is a broad statement. What is needed as far as I can see is efficient traversal of relations (i.e. arbitrarily wide, very selective joins). This can be supported, even if many existing RDMBSes don't.

>With XML querying hierarchies is a snap.
>So if you have a hierarchical problem, XML is a better technology.

Not so fast.

XML itself is just a standard for serializing labelled trees. In my experience, most of my "trees" are really arbitrary graphs (relations), and while XML supports crosslinks as well, XML definition and manipulation languages tend not to support their traversal well. For a forum this need not be an issue.

But XML also assumes that all data is stored as documents and processed by operating on documents one at a time. USENET does the same thing, but it really isn't very practical.

Some issues from a database perspective: What if we want to query or manipulate across the whole collection? Why do we always have to parse documents whenever we want to use the data they contain? Why do I, when writing queries or transformations on my data (e.g. with XPath, XQuery or XSLT) or schema definition (XML Schema - please) I always have to concern myself with stupid serialization and document management issues such as the consistent use of file names and URLs, file system limitations, character encodings, etcetera?

Not to mention that XPath, XSLT and XQuery are still pretty hideous languages, both syntactically and semantically, although they have much improved. Try representing an arbitrary relation (graph) in XML, then writing, say, an XPath expression to compute its connected components.

Not such a good match for a discussion forum, if you ask me. XPath queries may be expressive enough, but what about speed? Do you want to represent the whole forum contents as a single XML document that is updated whenever some posting or edit is performed? Or are you thinking of some solution that keeps the whole thing in memory in parsed form? How to make it scale?

>Someone referred to XML as messy technology that couldn't be optimised.
>But SleepyCat and XPath is faster than any relational system running any
>one of the ugly, complex and slow-as-mollases "relational solutions" to
>the hierarchical problem.

I'm not familiar with this, but does it work well for a big discusion forum? How sophisticated is the querying you allow?

>For some problems you don't need a database at all: grep or perhaps
>lucene or HyperEstaier are all that's needed.
>
>For some problems XML is the best choice, particularly if the data
>is naturally hierarchical.

... and consists of small enough bits (documents) that don't need to be queried or manipulated collectively.

>For other problems--particularly for *large* data problems--relational
>systems are the best choice....but almost never when hierarchies are
>involved.

I think this is far too strong a statement.

-- 
Reinier
Received on Wed Jan 07 2009 - 19:40:41 CET

Original text of this message