Re: XML databases [ was: S.O.D.A. database Query API - call for comments ]

From: Philip Lijnzaad <lijnzaad_at_ebi.ac.uk>
Date: Sat, 21 Jul 2001 23:24:42 GMT
Message-ID: <u7pud2oq9h.fsf_at_sol6.ebi.ac.uk>


On Mon, 21 May 2001 12:47:42 +0200,
"Carl" == Carl Rosenberger <carl_at_db4o.com> wrote:

Carl> Personally I dislike XML since:
Carl> - it needs to be parsed.
Carl> - repetitive tags produce unnecessary overhead.

>>
>> yup. But surely we'll soon have some dedicated encoding which can cut this
>> down considerably (and gzip already routinely compresses XML down to less
>> than 30% of the original).

Carl> Zipping XML only makes the situation worse: Carl> - You need to unzip the entire document to retrieve content.

Not necessarily; gunzip unzips things by blocks (e.g. you can concat gzipped files using cat (1)). And a dedicated encoding could even be more intelligent than this.

Carl> - You loose the only little advantage that XML has: read- and editability Carl> with widespread text editors.

That's why I called it encoding: this thing should be transparent to human readers and applications that cater to them (and incidentally, I routinely open gzipped files in emacs for reading ... things get gzipped (and/or tarred and/or ftp-ed!) upon writing).

Carl> A more effective format would
Carl> - not be text-editor-readable from the start

yes.

Carl> - try to avoid redundancies by system

yes.

Carl> - allow queries for points of interest without parsing the entire Carl> document

Impossible if the entire document is a complex interconnected graph.

Carl> - use up exactly the amount of information that the original data contains: Carl> - Integer, Float = 4 bytes

What byte-sex?

Carl>   - Long, Double = 8 bytes
Carl>   - Unicode would be possible
Carl>   - no tags

Exactly all of this and more is now already offered by CORBA's IIOP and CDR :-)

Carl> Carl> - internal links would simply be pointers within the file Carl> = internal file offset

I have the feeling this is brittle, but I guess it can be made to work. BTW, these things are hardly new; Lisp has been portably representing complex graphs for twenty or so years. And it is far more powerful and far less crufty :-)

                                                                       Philip
-- 
If you have a procedure with 10 parameters, you probably missed some. (Kraulis)
-----------------------------------------------------------------------------
Philip Lijnzaad, lijnzaad_at_ebi.ac.uk \ European Bioinformatics Institute,rm A2-08
+44 (0)1223 49 4639                 / Wellcome Trust Genome Campus, Hinxton
+44 (0)1223 49 4468 (fax)           \ Cambridgeshire CB10 1SD,  GREAT BRITAIN
PGP fingerprint: E1 03 BF 80 94 61 B6 FC  50 3D 1F 64 40 75 FB 53
Received on Sun Jul 22 2001 - 01:24:42 CEST

Original text of this message