Re: What is the logic of storing XML in a Database?

From: Aloha Kakuikanu <aloha.kakuikanu_at_yahoo.com>
Date: 28 Mar 2007 09:45:59 -0700
Message-ID: <1175100359.546720.300980_at_n59g2000hsh.googlegroups.com>


On Mar 28, 8:45 am, "Daniel" <danielapar..._at_gmail.com> wrote:
> But nobody can write and sell (or give away as open source) a CSV
> validator that validates an arbitrary CSV file against a standard
> schema describing that CSV file, for the simple reason that no such
> standard schema exists. No tools exist in this category, in contrast,
> many such tools exist in the XML space.

Why? CSV grammar validator is quite trivial. Here is CFG:

// lexer
line_end: '\n' ;
comma: ',' ;
value: ([a-z]|[A-Z]|[0-9])+;
// parser
line3: value comma value comma value line_end; csv_file: line3 | csv_file line3;]

> The validation
> referred to applies to the production and consumption of the messages,
> or some place in the middle.

Validation is parsing. Parsing is based on language theory, not on tags.

> As an example, an XML Schema may declare that an element named
> dayCountFraction is of type DayCountFraction.
>
> <xsd:element name="dayCountFraction" type="DayCountFraction">
>
> The type DayCountFraction may restrict the values that the field
> dayCountFraction can take to a specific list, e.g. "ACT/ACT",
> "30/360", "30/365".

Those are fairly easy to express with grammar as well.

> A schema validator tool will take an XML document, apply the schema,
> and will reject the message if the value is other than one in the
> list.

Once again, if validation is effectively parsing, what those silly tags are required for?

The parallel between databases and language theory is quite illuminating. One of the database cornerstones is Relational Algebra, whose the most important operators are the join and union. A centerpiece of parsing theory is an algebra of the two operators as well, and remarkably they are also join and union. Join in relational algebra is very different from the join in language theory, of course. It is no longer idempotent, nor commutative. It is distributive over the union, however. Next, arguably the most important parsing method is the CYK algorithm. It is essentially the transitive closure... Received on Wed Mar 28 2007 - 18:45:59 CEST

Original text of this message