Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> LOB Compression

LOB Compression

From: Kevin Crosbie <caoimhinocrosbai_at_at_yahoo.com>
Date: Tue, 04 Jul 2006 10:12:46 +0200
Message-ID: <44aa2292$0$26416$88260bb3@news-taz.teranews.com>


Hello,

Does anybody know a reason why UTL_COMPRESS would compress to an almost 2:1 ratio?

I'm looking into ways to compress LOBS in either 9i or 10g, the purpose being to calculate compression rates for different sets of text documents as a research project.

I was trying the UTL_COMPRESS package on 10g as a method of Limpel-Ziv compression and found that the compression rates that it offered seemed quite poor. I compared this to another compression technique, that of using a Deflater class in Java and found this much much better, regardless of the quality parameters I supplied.

An implementation of the Java Compression functions that I used can be found in a posting by Piotr Jarmuz in this AskTom article: http://asktom.oracle.com/pls/ask/f?p=4950:8:9988369109695830428::NO::F4950_P8_DISPLAYID,F4950_P8_CRITERIA:12980645053487

I used the following code to compare:

DECLARE
   rawtext raw(32767);

   uncompressed   blob;
   compressed1    blob;
   compressed2    blob;
   uncompressed1  blob;
   uncompressed2  blob;

BEGIN
   dbms_lob.createtemporary(uncompressed, TRUE);
   dbms_lob.open(uncompressed, dbms_lob.lob_readwrite);
   dbms_lob.writeappend(uncompressed, utl_raw.length(rawtext), rawtext);

   dbms_output.put_line('Uncompressed Length: '

|| dbms_lob.getlength(uncompressed));

   compressed1 := compressor.blob_compress(uncompressed);

   dbms_output.put_line('Java Compressed Length: '

|| dbms_lob.getlength(compressed1));

   compressed2 := utl_compress.lz_compress(uncompressed, 1);

   dbms_output.put_line('UTL Compressed Length: '

|| dbms_lob.getlength(compressed2));

   uncompressed1 := compressor.blob_decompress(compressed1);

   dbms_output.put_line('Java Decompressed Length: '

|| dbms_lob.getlength(uncompressed1));

   uncompressed2 := utl_compress.lz_uncompress(compressed2);

   dbms_output.put_line('UTL Uncompressed Length: '

|| dbms_lob.getlength(uncompressed2));

   dbms_lob.freetemporary(uncompressed1);    dbms_lob.freetemporary(uncompressed2);

   BEGIN
      uncompressed1 := compressor.blob_decompress(compressed2);

      dbms_output.put_line('UTL Decompressed by Java Length: '
                        || dbms_lob.getlength(uncompressed1));

      dbms_lob.freetemporary(uncompressed2);

      EXCEPTION
        WHEN OTHERS THEN
           dbms_output.put_line('Error encountered decompressing UTL '
                             || 'Compressed BLOB with Java Decompress ('
                             || SQLCODE || '): ' || SQLERRM);
   END;    BEGIN
      uncompressed2 := utl_compress.lz_uncompress(compressed1);
      dbms_output.put_line('Java Uncompressed by UTL Length: '
                        || dbms_lob.getlength(uncompressed2));

      dbms_lob.freetemporary(uncompressed1);

      EXCEPTION
        WHEN OTHERS THEN
           dbms_output.put_line('Error encountered decompressing Java '
                             || 'Compressed BLOB with UTL Uncompress ('
                             || SQLCODE || '): ' || SQLERRM);
   END;    dbms_lob.freetemporary(compressed1);
   dbms_lob.freetemporary(compressed2);

   dbms_lob.close(uncompressed);
   dbms_lob.freetemporary(uncompressed);

END; The output that I got was:

Uncompressed Length: 104
Java Compressed Length: 88
UTL Compressed Length: 100
Java Decompressed Length: 104
UTL Uncompressed Length: 104
Error encountered decompressing UTL Compressed BLOB with Java Decompress (-29532): ORA-29532: Java call terminated by uncaught Java exception: java.util.zip.ZipException: unknown compression method Error encountered decompressing Java Compressed BLOB with UTL Uncompress (-29294): ORA-29294: A data error occurred during compression or uncompression.

PL/SQL procedure successfully completed.

That was a small piece of text, just to provide an example, but as you can see the Java compression technique is quite a lot better than that of the UTL Compress method, as I said, I've tried different quality parameters to try to improve it. Also, neither method is able to decompress something output by the other.

In fact, on a large text document of say 3Mb, I found that UTL Compress would compress it to roughly half the size, whilst I would get a ratio of around 8:1 with the Java Deflater class.

Both techniques say that they implement Limpel-Ziv as in the zlib library, and both are apparently compatible with gzip.

I tried a further test, outputing the compressed BLOBS to files and trying to gunzip them (with -n option), but to no avail. The UTL Compressed file wasn't recognized at all and the Java one seemed to be missing information. Now, this is probably due to missing or incompatible headers, as zlib headers are not the same as gzip headers.   I also gziped the original uncompressed text, and it came out very similar in size to that of the file compressed by Java.

So, let's repeat my question!

Does anybody know a reason why UTL_COMPRESS would compress to an almost 2:1 ratio?

One might explain it by saying that gzip compresses better than, for instance, 'compress', the function provided by the zlib library, but for a comparison of techniques, the following page shows that it doesn't explain the large difference I'm seeing here: http://www.cs.rit.edu/~std3246/thesis/node36.html Received on Tue Jul 04 2006 - 03:12:46 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US