Re: What is the logic of storing XML in a Database?

From: Bernard Peek <>
Date: 28 Mar 2007 15:17:07 GMT
Message-ID: <>

On 2007-03-28, Cimode <> wrote:
> On Mar 28, 1:21 pm, Bernard Peek <> wrote:
>> On 2007-03-28, Cimode <> wrote:
>> > What advantage of XML as a transport format do you see over let's say
>> > a CSV file with an integrated header?
>> There's nothing that you can do with XML that can't be done in other ways.
>> But of course there's very little that you can do with a computer that you
>> can't eventually do with pencil and paper. Not all of the advantages of XML
>> are immediately obvious.
>> From my experience of replacing legacy formats with XML the main benefits
>> are:
>> Data can be validated before it's transmitted.
> Does that mean that CSV data with a header can't be validated? How do
> you validate data?

No it doesn't. As I said, there's nothing that you can do with XML that you can't do with CSV. You can validate the data by checking that it obeys the constraints defined in the schema.

>> Validation against a schema
>> will trap most major errors. It will trap most of the minor errors that
>> would normally require action by an expensive and extremely bored human being.
> In what a header does constitute a schema.

Not really. A schema is an external standard that the sender and receiver agree on. In theory you could put a copy of the file's syntax in machine-readable form in each data file. I haven't seen that done anywhere.

>> Therefore it reduces processing costs and staff turnover.
>> Errors are rejected by a machine. That usually makes it the sender's
>> responsibility to check and correct the data. Making that unambiguous saves
> In what does that differ from a CSV with a header?

It doesn't necessarily. As before, it's theoretically possible but I've never heard of anyone doing it. In essence an XML file is a delimited file with a header, so if someone set out to design your hypothetical file structure they could easily end up with XML.

>> a lot of time and endless arguments between business partners.
>> Code to handle XML is standardised and therefore doesn't need to be
>> rewritten for each individual application. This makes it more reliable and
>> cheaper to develop and maintain.
> How is standardized? What is a standard for coding XML?

It's standardised in that the code is delivered as part of the operating system or the development environment. Because everyone is using the same code it gets more thoroughly tested. If you decided instead to produce an open standard for CSV files with headers everyone could provide standard libraries for that too. But they haven't, and don't need to because we already have XML as specified in the W3C standards.

>> It is difficult to extend CSV systems boyond the simple flat-file system
>> with a single record type. Traditionally, at least in the systems I've
>> worked with, the solution is to denormalise the data from more than one
>> table. Therefore CSV is usually more verbose than XML and can take up much
> So what you are saying is that an XML file takes less space (less
> verbose) than a flat CSV file? Besides could you explain what you
> mean by *denormalize data from more than one table*.

One CSV file that I replaced included a 20 character field for the name of the company sending it in every line. It was always identical in every line because it was created by joining one recod in a name table with multiple records in another. The worst case situation is that a flat file might be created from the cartesian product of more than one table.

You can create a structured file that has multiple record structures in it. So for instance a line with a 0 as the first character represents an order header, and a line with a 1 as the first character is an order detail. I have seen files structured this way. But each file type requires its own schema and processing to write an to read. You could create a generic syntax and provide standards libraries to process it, and it might look a lot like XML.

In search of cognoscenti
Received on Wed Mar 28 2007 - 17:17:07 CEST

Original text of this message