Re: What is the logic of storing XML in a Database?

From: Bernard Peek <bap_at_alpha.shrdlu.com>
Date: 28 Mar 2007 16:16:08 GMT
Message-ID: <slrnf0l4r9.nrv.bap_at_alpha.shrdlu.com>


On 2007-03-28, Bob Badour <bbadour_at_pei.sympatico.ca> wrote:

> To amplify David's response, I would observe that having a formal
> specification to describe the content has benefits. Bernard, though,
> presents the above as an absolute advantage of XML.

I did add a comment that pretty much everything that XML does could in theory be done with CSV files.

>
> Certain, one can validate XML. The XML folks have, in fact, invented
> several such formal specification languages. My question is: What
> advantage do any of DTD's and XML Schemas have over the predecessors
> including COBOL copy books, regular expressions, BNF grammars, SQL
> Schemas, AWK, perl etc.
>
> Compared to what came before, the XML alternatives seem less functional
> and extremely bloated.

The one single advantage they have is that they have been adopted. Whether there is a rational reason for adopting them is a separate argument.

>
>
>>>Errors are rejected by a machine. That usually makes it the sender's
>>>responsibility to check and correct the data. Making that unambiguous
>> saves
>>>a lot of time and endless arguments between business partners.
>
> If the receiver fails to check the data, the receiver is an idiot.
>
>
>> and a DBMS doesn't do this?
>
> Neither cures stupidity.
>
>
>>>Code to handle XML is standardised and therefore doesn't need to be
>>>rewritten for each individual application. This makes it more reliable and
>>>cheaper to develop and maintain.
>>
>> and a DBMS interface language doesn't do this?
>
> BNF didn't do this? Regular expressions didn't do this? COBOL copy books
> didn't do this? AWK didn't do this? Perl didn't do this? XML invented
> several new "standards" forcing folks to rewrite everything multiple
> times in any case.

I haven't come across COBOL copy books, they are presumably not dependent on choosing COBOL. I've come across all of the others and none of them could replace XML although they could be used to build tools to do it. But so could Z80 assembler.

> "Oh, you use DTD? We use XML Schema."

That isn't a problem that's likely to occur because it doesn't matter whether a team normally uses schemas or DTDs. They need to agree to use a specific schema or DTD, what they use in other projects doesn't matter.

>
>
>>>It is difficult to extend CSV systems boyond the simple flat-file system
>>>with a single record type.
>
> Horseshit. I have seen no shortage of CSV files with multiple record types.

It's possible, doing it right every time is difficult. It must be difficult or the people sending me CSV files wouldn't keep sending them in so many subtly different formats.

>
>
> Traditionally, at least in the systems I've
>>>worked with, the solution is to denormalise the data from more than one
>>>table. Therefore CSV is usually more verbose than XML and can take up much
>>>more storage space. (The storage space argument isn't one I usually have a
>>>lot of time for - it's not usually worth bothering with.)
>>
>> CSV is useful (among other things) for unloading an SQL table into a text
>> file,
>> transporting the text file into a completely different environment, and
>> loading the data into another SQL table. There are better ways, but CSV
>> sometimes works where the better ways are unavailable.
>>
>> You can deal with normalization problems by simply unloading separate tables
>> to separate CSV files. Your resulting CSV files will typically be smaller
>> than XML files.
>
> UUENCODED CSV files will typically be smaller than XML files.

In a well-designed system it should be possible to make that so.

>
>
>>>XML data is not generally manually edited, this is a huge advantage.
>>
>> Fixing
>>
>>>manually prepared data files soaks up vast amounts of time and effort.
>>
>> It's
>>
>>>more likely that XML files will be generated and read by automated systems
>>>than by someone typing data. That makes XML data much more reliable than
>>>CSV.
>
> Wasn't the fact that XML is plain text that humans can understand even
> in a text editor one of the early marketing pitches? I could swear it was.
>
>
>> You can do the same thing with CSV. I've done it.
>
> One can automate just about everything from playing video games to
> driving tractors to clustering articles by relevance.

Yes you can. But you may not be allowed to. I lost a developer who got bored with spending half his time massaging manually produced CSV files that arrived in subtly different formats depending on who had prepared the file this week. The companies sending the data insisted on manual preparation to avoid mistakes, which is why they made so many mistakes. Sometimes the problems you have to work around are political more than technical. But they are no less a problem for that.

-- 
bap_at_shrdlu.com
In search of cognoscenti
Received on Wed Mar 28 2007 - 18:16:08 CEST

Original text of this message