One cheer for XML ...

From: DBMS_Plumber <paul_geoffrey_brown_at_yahoo.com>
Date: 29 Mar 2007 10:15:23 -0700
Message-ID: <1175188523.051840.94520_at_y66g2000hsf.googlegroups.com>



Over in a neighboring thread, the prospect of XML inside the DBMS is recieving a fairly well deserved bashing. Here, I'm gonna point out one fairly obvious way in which XML in the DBMS makes sense.

We live in a time where information systems create lots of lots of 'unstructured' data content: e-mails, chat logs, recording phone conversations, etc. I'm using the word 'unstructured' here with some precision. This stuff is not 'semi-structured' or some other wankery. It's just a bunch of words. For example:

--

From: Max Planck
To: Einstein, Albert
Subject: Relatively speaking ....

Hey Bert!

Caught your powerpoint slides at the Bern conference! Kewl stuff dewd! I was chatting with Marie Curie and she was impressed too, but we're having a hard time figuring out how to, you know, falsify it.

BTW: Dinner next week? I'm in Berlin on 0/5/1908. Call me (321)304-2945. We'll do lunch! Mebbe some beer too, eh?

Much love to the wife! More love to the girlfriend! <grin>

Yours - Max

--

Modern software allows us to process this kind of text and to 'mark it up' with tags that convey semantic information to a reader unfamiliar with the context / history etc. The idea is that this is software that can pick out names, dates, phone numbers, etc. and inline tags into the text to indicate what things are. Yeah, it's all a bit wobbly, but it's actually useful rather useful.

This software might render the lil e-mail above in the following way.

--

From: <PersonName>Max Planck</PersonName> To: <PersonName>Einstein, Albert</PersonName> Subject: Relatively speaking ....

Hey Bert!

Caught your <ProductName>powerpoint</ProductName> slides at the <PlaceName>Bern</PlaceName> conference! <Pos>Kewl stuff dewd!</Pos> I was chatting with <PersonName>Marie Curie</PersonName> and she was impressed too, but we're having a hard time figuring out how to, you know, falsify it.

<Acronym>BTW</Acronym>: <Invitation pr="0.2">Dinner next week? I'm in <PlaceName>Berlin</PlaceName> on <Date>10/5/1908</Date>. </Invitation> Call me <PhoneNumber>(321)304-2945</PhoneNumber>. We'll do lunch! Mebbe some beer too, eh?

Much love to the wife! More love to the girlfriend! <grin>

Yours - Max
--

'Semantics' here simply means that a non-english speaker, or someone unfamiliar with European geography, would be able to look at this mess and say 'Oh! Placenames! Dates!' without actually understanding what those tokens meant. This gets really handy when you're dealing with lots and lots and lots of placenames, peoplenames, idiomatic references, acronyms, etc.

There's usually a mess of book-keeping stuff in this 'converted' email  too. But you get the idea.

Anyway, this makes it possible to envision the following kind of DDL / DML. CREATE TABLE EMail (

    ID INTEGER NOT NULL PRIMARY KEY,     From INTEGER NOT NULL REFERENCES People ( Id ),

    To        INTEGER NOT NULL REFERENCES People ( Id ),
    Text     XML         NOT NULL

 );

  Q: Notes To Albert mentioning places and dates during 1908?

   SELECT E.ID
      FROM EMail E, People P
   WHERE XQuery(
'FOR $i IN //Date FOR $j IN //PlaceNames WHERE $i > 1908 AND $i < 1909 RETURN <Invitation><Where>$j</Where><When>$i</When></Invitation>', E.Text )

      AND E.To = P.Id
      AND P.Who = 'Albert Einstein';

  Now, organizing the XQuery() thingie and embedding it within the SQL framework, how do we store and eficiently retrieve stuff etc .... these are interesting questions. They touch on relational theory questions (what portions of XQuery can be re-written as relational operations) but they also cover a bunch of meta-questions (can we make closed-world assumptions? what kinds of integrity guarantees are meaningful? To what extent is logical independence possible (order counts in XML)?) As a rule, for every tuple in a SQL DBMS somewhere, there are 10 e-mails, powerpoint slides, wiki pages, all containing important and valuable information about the tuple.

 Anyway - the people working on getting XML into SQL DBMS products are not the shambling crowd of ignorant, drooling, herd-following zombies many folk in the other thread seem to believe. Some of 'em are friends of mine, and they're even more familiar with the drawbacks and limitations of XML and XQuery than you are. (Did you know indexing XML is next to impossible because the type system is query-time determined? Or that order doesn't apply to attributes, only to elements, making 'deep equality' a nightmare? And don't get 'em started on updates!)

Hope this helps a little Received on Thu Mar 29 2007 - 19:15:23 CEST

Original text of this message