Re: What is the logic of storing XML in a Database?

From: Bernard Peek <bap_at_alpha.shrdlu.com>
Date: 28 Mar 2007 15:44:46 GMT
Message-ID: <slrnf0l30e.lk1.bap_at_alpha.shrdlu.com>


On 2007-03-28, David Cressey <cressey73_at_verizon.net> wrote:

> It depends on what you include as a "legacy format". Some XML evangelizers
> are calling SQL tables a "legacy format". I don't believe them for an
> instant, but it's worth making this explicit.

Noted.

>
>
>> Data can be validated before it's transmitted. Validation against a schema
>> will trap most major errors. It will trap most of the minor errors that
>> would normally require action by an expensive and extremely bored human
> being.
>> Therefore it reduces processing costs and staff turnover.
>>
>
>> Errors are rejected by a machine. That usually makes it the sender's
>> responsibility to check and correct the data. Making that unambiguous
> saves
>> a lot of time and endless arguments between business partners.
>>
> and a DBMS doesn't do this?

It can do the same job if the sender and receiver have identical databases. If they don't then there is potentially confusion and argument about which one is the definitive version. It can get very political, and I don't want my people getting caught in a turf war.

When I replaced a legacy format with XML I saved about 0.5 FTE for each company in the industry.

>
>> Code to handle XML is standardised and therefore doesn't need to be
>> rewritten for each individual application. This makes it more reliable and
>> cheaper to develop and maintain.
>>
>
> and a DBMS interface language doesn't do this?

It can do. In fact one does, XML. There isn't any other universal standard that's architecture and supplier independent. Even the supposedly standard doesn't necessarily work, I suppose the subset in ODBC is as close as it gets.

>
>> It is difficult to extend CSV systems boyond the simple flat-file system
>> with a single record type. Traditionally, at least in the systems I've
>> worked with, the solution is to denormalise the data from more than one
>> table. Therefore CSV is usually more verbose than XML and can take up much
>> more storage space. (The storage space argument isn't one I usually have a
>> lot of time for - it's not usually worth bothering with.)
>>
>
> CSV is useful (among other things) for unloading an SQL table into a text
> file,
> transporting the text file into a completely different environment, and
> loading the data into another SQL table. There are better ways, but CSV
> sometimes works where the better ways are unavailable.
>
> You can deal with normalization problems by simply unloading separate tables
> to separate CSV files. Your resulting CSV files will typically be smaller
> than XML files.

True, but that's only one use. If I needed to do that job I might well use that technique. I might also use the database's utilities to dump the data as an SQL command file that will re-create the database. That gives me a highly structured file that uses a standard syntax that the database is designed to understand. XML also fits that description.

>> XML data is not generally manually edited, this is a huge advantage.
> Fixing
>> manually prepared data files soaks up vast amounts of time and effort.
> It's
>> more likely that XML files will be generated and read by automated systems
>> than by someone typing data. That makes XML data much more reliable than
>> CSV.
>>
>
> You can do the same thing with CSV. I've done it.

In the past I've been tempted to deliberately obfuscate CSV files so that people aren't tempted to edit them. XML looks scary to people who look at the raw text unless they understand it. I'd rather data files weren't edited by people who don't understand them.

-- 
bap_at_shrdlu.com
In search of cognoscenti
Received on Wed Mar 28 2007 - 17:44:46 CEST

Original text of this message