Re: One cheer for XML ...

From: David Cressey <cressey73_at_verizon.net>
Date: Thu, 29 Mar 2007 19:24:49 GMT
Message-ID: <5oUOh.14926$_S.13341_at_trndny08>


"DBMS_Plumber" <paul_geoffrey_brown_at_yahoo.com> wrote in message news:1175188523.051840.94520_at_y66g2000hsf.googlegroups.com...
> Over in a neighboring thread, the prospect of XML inside the DBMS is
> recieving a fairly well deserved bashing. Here, I'm gonna point out
> one fairly obvious way in which XML in the DBMS makes sense.
>
> We live in a time where information systems create lots of lots of
> 'unstructured' data content: e-mails, chat logs, recording phone
> conversations, etc. I'm using the word 'unstructured' here with some
> precision. This stuff is not 'semi-structured' or some other wankery.
> It's just a bunch of words. For example:
>
Here's what this means to me:

       "if you don't know what you're talking about, say it with XML!"

> --
> From: Max Planck
> To: Einstein, Albert
> Subject: Relatively speaking ....
>
> Hey Bert!
>
> Caught your powerpoint slides at the Bern conference! Kewl stuff dewd!
> I was chatting with Marie Curie and she was impressed too, but we're
> having a hard time figuring out how to, you know, falsify it.
>
> BTW: Dinner next week? I'm in Berlin on 0/5/1908. Call me
> (321)304-2945. We'll do lunch! Mebbe some beer too, eh?
>
> Much love to the wife! More love to the girlfriend! <grin>
>
> Yours - Max
>
> --
>
> Modern software allows us to process this kind of text and to 'mark it
> up' with tags that convey semantic information to a reader unfamiliar
> with the context / history etc. The idea is that this is software that
> can pick out names, dates, phone numbers, etc. and inline tags into
> the text to indicate what things are. Yeah, it's all a bit wobbly, but
> it's actually useful rather useful.
>
> This software might render the lil e-mail above in the following way.
>
> --
> From: <PersonName>Max Planck</PersonName>
> To: <PersonName>Einstein, Albert</PersonName>
> Subject: Relatively speaking ....
>
> Hey Bert!
>
> Caught your <ProductName>powerpoint</ProductName> slides at the
> <PlaceName>Bern</PlaceName> conference! <Pos>Kewl stuff dewd!</Pos> I
> was chatting with <PersonName>Marie Curie</PersonName> and she was
> impressed too, but we're having a hard time figuring out how to, you
> know, falsify it.
>
> <Acronym>BTW</Acronym>: <Invitation pr="0.2">Dinner next week? I'm in
> <PlaceName>Berlin</PlaceName> on <Date>10/5/1908</Date>. </Invitation>
> Call me <PhoneNumber>(321)304-2945</PhoneNumber>. We'll do lunch!
> Mebbe some beer too, eh?
>
> Much love to the wife! More love to the girlfriend! <grin>
>
> Yours - Max
> --
> 'Semantics' here simply means that a non-english speaker, or someone
> unfamiliar with European geography, would be able to look at this mess
> and say 'Oh! Placenames! Dates!' without actually understanding what
> those tokens meant. This gets really handy when you're dealing with
> lots and lots and lots of placenames, peoplenames, idiomatic
> references, acronyms, etc.
>
> There's usually a mess of book-keeping stuff in this 'converted' e-
> mail too. But you get the idea.
>
> Anyway, this makes it possible to envision the following kind of DDL /
> DML.
>
> CREATE TABLE EMail (
> ID INTEGER NOT NULL PRIMARY KEY,
> From INTEGER NOT NULL REFERENCES People ( Id ),
> To INTEGER NOT NULL REFERENCES People ( Id ),
> Text XML NOT NULL
> );
>
> Q: Notes To Albert mentioning places and dates during 1908?
>
> SELECT E.ID
> FROM EMail E, People P
> WHERE XQuery(
> 'FOR $i IN //Date FOR $j IN //PlaceNames WHERE $i > 1908 AND $i < 1909
> RETURN <Invitation><Where>$j</Where><When>$i</When></Invitation>',
> E.Text )
> AND E.To = P.Id
> AND P.Who = 'Albert Einstein';
>
> Now, organizing the XQuery() thingie and embedding it within the SQL
> framework, how do we store and eficiently retrieve stuff etc ....
> these are interesting questions. They touch on relational theory
> questions (what portions of XQuery can be re-written as relational
> operations) but they also cover a bunch of meta-questions (can we make
> closed-world assumptions? what kinds of integrity guarantees are
> meaningful? To what extent is logical independence possible (order
> counts in XML)?) As a rule, for every tuple in a SQL DBMS somewhere,
> there are 10 e-mails, powerpoint slides, wiki pages, all containing
> important and valuable information about the tuple.
>
> Anyway - the people working on getting XML into SQL DBMS products are
> not the shambling crowd of ignorant, drooling, herd-following zombies
> many folk in the other thread seem to believe. Some of 'em are friends
> of mine, and they're even more familiar with the drawbacks and
> limitations of XML and XQuery than you are. (Did you know indexing XML
> is next to impossible because the type system is query-time
> determined? Or that order doesn't apply to attributes, only to
> elements, making 'deep equality' a nightmare? And don't get 'em
> started on updates!)
>
Just because there are a few intelligent people doing intelligent things with XML doesn't dismiss the arguments made in here. What most of us are condemning is letting the thundering herd believe that using XML is a good alternative to managing data.

If you've ever had to clean up the mess, you'll understand the vehemence expressed by some of our regulars. Received on Thu Mar 29 2007 - 21:24:49 CEST

Original text of this message