Re: What is the logic of storing XML in a Database?

From: Cimode <>
Date: 28 Mar 2007 09:15:27 -0700
Message-ID: <>

On Mar 28, 4:17 pm, Bernard Peek <> wrote:
> On 2007-03-28, Cimode <> wrote:
> > On Mar 28, 1:21 pm, Bernard Peek <> wrote:
> >> On 2007-03-28, Cimode <> wrote:
> >> > What advantage of XML as a transport format do you see over let's say
> >> > a CSV file with an integrated header?
> >> There's nothing that you can do with XML that can't be done in other ways.
> >> But of course there's very little that you can do with a computer that you
> >> can't eventually do with pencil and paper. Not all of the advantages of XML
> >> are immediately obvious.
> >> From my experience of replacing legacy formats with XML the main benefits
> >> are:
> >> Data can be validated before it's transmitted.
> > Does that mean that CSV data with a header can't be validated? How do
> > you validate data?
> No it doesn't. As I said, there's nothing that you can do with XML that you
> can't do with CSV. You can validate the data by checking that it obeys the
> constraints defined in the schema.
Validate in what perspective? to send it? What if you data is already validated at table level...(supposing the right constraints are in place)?

> >> Validation against a schema
> >> will trap most major errors. It will trap most of the minor errors that
> >> would normally require action by an expensive and extremely bored human being.
> > In what a header does constitute a schema.
> Not really. A schema is an external standard that the sender and receiver
> agree on. In theory you could put a copy of the file's syntax in
> machine-readable form in each data file. I haven't seen that done anywhere.
So a schema is the standard structure by which the user sending a file gets in agreement with the receiver right? Can't they do that even for CSV file? I mean what is the real added value in using XML as opposed to CSV?

> >> Therefore it reduces processing costs and staff turnover.
> >> Errors are rejected by a machine. That usually makes it the sender's
> >> responsibility to check and correct the data. Making that unambiguous saves
> > In what does that differ from a CSV with a header?
> It doesn't necessarily. As before, it's theoretically possible but I've
> never heard of anyone doing it. In essence an XML file is a delimited file
> with a header, so if someone set out to design your hypothetical file
> structure they could easily end up with XML.
So do you agree that a CSV file with a header can perfectly replace an XML file with same usage?

> >> a lot of time and endless arguments between business partners.
> >> Code to handle XML is standardised and therefore doesn't need to be
> >> rewritten for each individual application. This makes it more reliable and
> >> cheaper to develop and maintain.
> > How is standardized? What is a standard for coding XML?
> It's standardised in that the code is delivered as part of the operating
> system or the development environment. Because everyone is using the same
> code it gets more thoroughly tested. If you decided instead to produce an
> open standard for CSV files with headers everyone could provide standard
> libraries for that too. But they haven't, and don't need to because we
> already have XML as specified in the W3C standards.
I could see how the structure is validated by W3C standards. But what about *correctness* of data? Besides I still have hard time to see that it would be easier that a hierarchical stucture would be easier to validate than a table structure?

> >> It is difficult to extend CSV systems boyond the simple flat-file system
> >> with a single record type. Traditionally, at least in the systems I've
> >> worked with, the solution is to denormalise the data from more than one
> >> table. Therefore CSV is usually more verbose than XML and can take up much
> > So what you are saying is that an XML file takes less space (less
> > verbose) than a flat CSV file?
But you said the opposite. Just trying to understand what you are saying...

> Besides could you explain what you
> > mean by *denormalize data from more than one table*.

> One CSV file that I replaced included a 20 character field for the name of
> the company sending it in every line. It was always identical in every line
> because it was created by joining one recod in a name table with multiple
> records in another. The worst case situation is that a flat file might be
> created from the cartesian product of more than one table.
> You can create a structured file that has multiple record structures in it.
> So for instance a line with a 0 as the first character represents an order
> header, and a line with a 1 as the first character is an order detail. I
> have seen files structured this way. But each file type requires its own
> schema and processing to write an to read. You could create a generic syntax
> and provide standards libraries to process it, and it might look a lot like
> XML.
I do not quite see what grouping and query correctness as well as cartesian product explains how XML is superior to CSV...

Regards... Received on Wed Mar 28 2007 - 18:15:27 CEST

Original text of this message