Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> comp.databases.theory -> Re: One cheer for XML ...

Re: One cheer for XML ...

From: JOG <jog_at_cs.nott.ac.uk>
Date: 29 Mar 2007 16:39:45 -0700
Message-ID: <1175211585.540983.245000@y80g2000hsf.googlegroups.com>


On Mar 29, 6:15 pm, "DBMS_Plumber" <paul_geoffrey_br..._at_yahoo.com> wrote:
> Over in a neighboring thread, the prospect of XML inside the DBMS is
> recieving a fairly well deserved bashing. Here, I'm gonna point out
> one fairly obvious way in which XML in the DBMS makes sense.
>
> We live in a time where information systems create lots of lots of
> 'unstructured' data content: e-mails, chat logs, recording phone
> conversations, etc. I'm using the word 'unstructured' here with some
> precision. This stuff is not 'semi-structured' or some other wankery.
> It's just a bunch of words. For example:
>
> --
> From: Max Planck
> To: Einstein, Albert
> Subject: Relatively speaking ....
>
> Hey Bert!
>
> Caught your powerpoint slides at the Bern conference! Kewl stuff dewd!
> I was chatting with Marie Curie and she was impressed too, but we're
> having a hard time figuring out how to, you know, falsify it.
>
> BTW: Dinner next week? I'm in Berlin on 0/5/1908. Call me
> (321)304-2945. We'll do lunch! Mebbe some beer too, eh?
>
> Much love to the wife! More love to the girlfriend! <grin>
>
> Yours - Max
>
> --
>
> Modern software allows us to process this kind of text and to 'mark it
> up' with tags that convey semantic information to a reader unfamiliar
> with the context / history etc. The idea is that this is software that
> can pick out names, dates, phone numbers, etc. and inline tags into
> the text to indicate what things are. Yeah, it's all a bit wobbly, but
> it's actually useful rather useful.
>
> This software might render the lil e-mail above in the following way.
>
> --
> From: <PersonName>Max Planck</PersonName>
> To: <PersonName>Einstein, Albert</PersonName>
> Subject: Relatively speaking ....
>
> Hey Bert!
>
> Caught your <ProductName>powerpoint</ProductName> slides at the
> <PlaceName>Bern</PlaceName> conference! <Pos>Kewl stuff dewd!</Pos> I
> was chatting with <PersonName>Marie Curie</PersonName> and she was
> impressed too, but we're having a hard time figuring out how to, you
> know, falsify it.
>
> <Acronym>BTW</Acronym>: <Invitation pr="0.2">Dinner next week? I'm in
> <PlaceName>Berlin</PlaceName> on <Date>10/5/1908</Date>. </Invitation>
> Call me <PhoneNumber>(321)304-2945</PhoneNumber>. We'll do lunch!
> Mebbe some beer too, eh?
>
> Much love to the wife! More love to the girlfriend! <grin>
>
> Yours - Max
> --
> 'Semantics' here simply means that a non-english speaker, or someone
> unfamiliar with European geography, would be able to look at this mess
> and say 'Oh! Placenames! Dates!' without actually understanding what
> those tokens meant. This gets really handy when you're dealing with
> lots and lots and lots of placenames, peoplenames, idiomatic
> references, acronyms, etc.
>
> There's usually a mess of book-keeping stuff in this 'converted' e-
> mail too. But you get the idea.
>
> Anyway, this makes it possible to envision the following kind of DDL /
> DML.
>
> CREATE TABLE EMail (
> ID INTEGER NOT NULL PRIMARY KEY,
> From INTEGER NOT NULL REFERENCES People ( Id ),
> To INTEGER NOT NULL REFERENCES People ( Id ),
> Text XML NOT NULL
> );
>
> Q: Notes To Albert mentioning places and dates during 1908?
>
> SELECT E.ID
> FROM EMail E, People P
> WHERE XQuery(
> 'FOR $i IN //Date FOR $j IN //PlaceNames WHERE $i > 1908 AND $i < 1909
> RETURN <Invitation><Where>$j</Where><When>$i</When></Invitation>',
> E.Text )
> AND E.To = P.Id
> AND P.Who = 'Albert Einstein';
>
> Now, organizing the XQuery() thingie and embedding it within the SQL
> framework, how do we store and eficiently retrieve stuff etc ....
> these are interesting questions. They touch on relational theory
> questions (what portions of XQuery can be re-written as relational
> operations) but they also cover a bunch of meta-questions (can we make
> closed-world assumptions? what kinds of integrity guarantees are
> meaningful? To what extent is logical independence possible (order
> counts in XML)?) As a rule, for every tuple in a SQL DBMS somewhere,
> there are 10 e-mails, powerpoint slides, wiki pages, all containing
> important and valuable information about the tuple.
>
> Anyway - the people working on getting XML into SQL DBMS products are
> not the shambling crowd of ignorant, drooling, herd-following zombies
> many folk in the other thread seem to believe. Some of 'em are friends
> of mine, and they're even more familiar with the drawbacks and
> limitations of XML and XQuery than you are. (Did you know indexing XML
> is next to impossible because the type system is query-time
> determined? Or that order doesn't apply to attributes, only to
> elements, making 'deep equality' a nightmare? And don't get 'em
> started on updates!)
>
> Hope this helps a little

If I had a load of emails that I wanted to store, I could certainly pull out enough propositions from them that had commonalities to produce some sort of usable structure. That alone is enough to suggest use of database fundamentals would be beneficial.

However, on top of this, what sufficient distinction is there between <Acronym>, <Date>, etc, and <Title> to suggest that the first two should be treated differently to the latter? Received on Thu Mar 29 2007 - 18:39:45 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US