Re: What is the logic of storing XML in a Database?

From: Bernard Peek <bap_at_alpha.shrdlu.com>
Date: 28 Mar 2007 18:33:01 GMT
Message-ID: <slrnf0lcrt.ssr.bap_at_alpha.shrdlu.com>


On 2007-03-28, Cimode <cimode_at_hotmail.com> wrote:

>> No it doesn't. As I said, there's nothing that you can do with XML that you
>> can't do with CSV. You can validate the data by checking that it obeys the
>> constraints defined in the schema.
> Validate in what perspective? to send it? What if you data is already
> validated at table level...(supposing the right constraints are in
> place)?

I don't know or care about what constraints they have in their database. I only care about the data they send me.

>
>> >> Validation against a schema
>> >> will trap most major errors. It will trap most of the minor errors that
>> >> would normally require action by an expensive and extremely bored human being.
>> > In what a header does constitute a schema.
>>
>> Not really. A schema is an external standard that the sender and receiver
>> agree on. In theory you could put a copy of the file's syntax in
>> machine-readable form in each data file. I haven't seen that done anywhere.
> So a schema is the standard structure by which the user sending a file
> gets in agreement with the receiver right? Can't they do that even
> for CSV file? I mean what is the real added value in using XML as
> opposed to CSV?

Yes, and in practise CSV formats are usually negotiated between sender and receiver, in much the same way that XML standards are. But in practise standards are not usually strictly defined. For instance a lot of the specs that I've seen haven't specified whether the file uses DOS, Mac or UNIX line endings.

>
>
>> >> Therefore it reduces processing costs and staff turnover.
>>
>> >> Errors are rejected by a machine. That usually makes it the sender's
>> >> responsibility to check and correct the data. Making that unambiguous saves
>> > In what does that differ from a CSV with a header?
>>
>> It doesn't necessarily. As before, it's theoretically possible but I've
>> never heard of anyone doing it. In essence an XML file is a delimited file
>> with a header, so if someone set out to design your hypothetical file
>> structure they could easily end up with XML.
> So do you agree that a CSV file with a header can perfectly replace an
> XML file with same usage?

In theory it was possible, but the work to make that happen would have had to have been done before XML was adopted.

>
>
>> >> a lot of time and endless arguments between business partners.
>>
>> >> Code to handle XML is standardised and therefore doesn't need to be
>> >> rewritten for each individual application. This makes it more reliable and
>> >> cheaper to develop and maintain.
>> > How is standardized? What is a standard for coding XML?
>>
>> It's standardised in that the code is delivered as part of the operating
>> system or the development environment. Because everyone is using the same
>> code it gets more thoroughly tested. If you decided instead to produce an
>> open standard for CSV files with headers everyone could provide standard
>> libraries for that too. But they haven't, and don't need to because we
>> already have XML as specified in the W3C standards.
> I could see how the structure is validated by W3C standards. But what
> about *correctness* of data? Besides I still have hard time to see
> that it would be easier that a hierarchical stucture would be easier
> to validate than a table structure?

It's very difficult to test for correctness of the data but it is possible. You can for instance send both the value of an invoice and the value for each line-item. I'm not sure that XML alone can deal with checking that the sums match. Most systems can be made more reliable by adding the appropriate redundancy.

XML can't detect every error, but can find most.

>
>> >> It is difficult to extend CSV systems boyond the simple flat-file system
>> >> with a single record type. Traditionally, at least in the systems I've
>> >> worked with, the solution is to denormalise the data from more than one
>> >> table. Therefore CSV is usually more verbose than XML and can take up much
>> > So what you are saying is that an XML file takes less space (less
>> > verbose) than a flat CSV file?
> But you said the opposite. Just trying to understand what you are
> saying...

Usually the XML file is bigger, but I have seen situations where a flat-file repeats a lot of data, and is therefore bloated.

>
>> Besides could you explain what you
>> > mean by *denormalize data from more than one table*.
>
>
>
>> One CSV file that I replaced included a 20 character field for the name of
>> the company sending it in every line. It was always identical in every line
>> because it was created by joining one recod in a name table with multiple
>> records in another. The worst case situation is that a flat file might be
>> created from the cartesian product of more than one table.
>> You can create a structured file that has multiple record structures in it.
>> So for instance a line with a 0 as the first character represents an order
>> header, and a line with a 1 as the first character is an order detail. I
>> have seen files structured this way. But each file type requires its own
>> schema and processing to write an to read. You could create a generic syntax
>> and provide standards libraries to process it, and it might look a lot like
>> XML.
> I do not quite see what grouping and query correctness as well as
> cartesian product explains how XML is superior to CSV...

XML can transmit the data from two tables in the same file, without having to repeat data values on each line. You can do that with CSV files, but it can get messy.

-- 
bap_at_shrdlu.com
In search of cognoscenti
Received on Wed Mar 28 2007 - 20:33:01 CEST

Original text of this message