Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: LOB Compression

Re: LOB Compression

From: DA Morgan <damorgan_at_psoug.org>
Date: Tue, 04 Jul 2006 08:52:32 -0700
Message-ID: <1152028356.645151@bubbleator.drizzle.com>


Kevin Crosbie wrote:
> Hello,
>
> Does anybody know a reason why UTL_COMPRESS would compress to an almost
> 2:1 ratio?
>
> I'm looking into ways to compress LOBS in either 9i or 10g, the purpose
> being to calculate compression rates for different sets of text
> documents as a research project.
>
> I was trying the UTL_COMPRESS package on 10g as a method of Limpel-Ziv
> compression and found that the compression rates that it offered seemed
> quite poor. I compared this to another compression technique, that of
> using a Deflater class in Java and found this much much better,
> regardless of the quality parameters I supplied.
>
> An implementation of the Java Compression functions that I used can be
> found in a posting by Piotr Jarmuz in this AskTom article:
> http://asktom.oracle.com/pls/ask/f?p=4950:8:9988369109695830428::NO::F4950_P8_DISPLAYID,F4950_P8_CRITERIA:12980645053487
>
> I used the following code to compare:
>
> DECLARE
> rawtext raw(32767);
> uncompressed blob;
> compressed1 blob;
> compressed2 blob;
> uncompressed1 blob;
> uncompressed2 blob;
> BEGIN
>
> -- Set up BLOB
> rawtext := utl_raw.cast_to_raw(
> 'The quick brown fox jumps over the lazy dog, '
> || 'Jack Sprat could eat not fat and his wife could eat no
> lean');
>
> dbms_lob.createtemporary(uncompressed, TRUE);
> dbms_lob.open(uncompressed, dbms_lob.lob_readwrite);
> dbms_lob.writeappend(uncompressed, utl_raw.length(rawtext), rawtext);
>
> dbms_output.put_line('Uncompressed Length: '
> || dbms_lob.getlength(uncompressed));
>
> compressed1 := compressor.blob_compress(uncompressed);
>
> dbms_output.put_line('Java Compressed Length: '
> || dbms_lob.getlength(compressed1));
>
> compressed2 := utl_compress.lz_compress(uncompressed, 1);
>
> dbms_output.put_line('UTL Compressed Length: '
> || dbms_lob.getlength(compressed2));
>
> uncompressed1 := compressor.blob_decompress(compressed1);
>
> dbms_output.put_line('Java Decompressed Length: '
> || dbms_lob.getlength(uncompressed1));
>
> uncompressed2 := utl_compress.lz_uncompress(compressed2);
>
> dbms_output.put_line('UTL Uncompressed Length: '
> || dbms_lob.getlength(uncompressed2));
>
> dbms_lob.freetemporary(uncompressed1);
> dbms_lob.freetemporary(uncompressed2);
>
> BEGIN
> uncompressed1 := compressor.blob_decompress(compressed2);
>
> dbms_output.put_line('UTL Decompressed by Java Length: '
> || dbms_lob.getlength(uncompressed1));
>
> dbms_lob.freetemporary(uncompressed2);
>
> EXCEPTION
> WHEN OTHERS THEN
> dbms_output.put_line('Error encountered decompressing UTL '
> || 'Compressed BLOB with Java Decompress ('
> || SQLCODE || '): ' || SQLERRM);
> END;
>
>
> BEGIN
> uncompressed2 := utl_compress.lz_uncompress(compressed1);
>
> dbms_output.put_line('Java Uncompressed by UTL Length: '
> || dbms_lob.getlength(uncompressed2));
>
> dbms_lob.freetemporary(uncompressed1);
>
> EXCEPTION
> WHEN OTHERS THEN
> dbms_output.put_line('Error encountered decompressing Java '
> || 'Compressed BLOB with UTL Uncompress ('
> || SQLCODE || '): ' || SQLERRM);
> END;
>
> dbms_lob.freetemporary(compressed1);
> dbms_lob.freetemporary(compressed2);
>
> dbms_lob.close(uncompressed);
> dbms_lob.freetemporary(uncompressed);
>
> END;
>
>
> The output that I got was:
>
> Uncompressed Length: 104
> Java Compressed Length: 88
> UTL Compressed Length: 100
> Java Decompressed Length: 104
> UTL Uncompressed Length: 104
> Error encountered decompressing UTL Compressed BLOB with Java Decompress
> (-29532): ORA-29532: Java call terminated by uncaught Java exception:
> java.util.zip.ZipException: unknown compression method
> Error encountered decompressing Java Compressed BLOB with UTL Uncompress
> (-29294): ORA-29294: A data error occurred during compression or
> uncompression.
>
> PL/SQL procedure successfully completed.
>
> That was a small piece of text, just to provide an example, but as you
> can see the Java compression technique is quite a lot better than that
> of the UTL Compress method, as I said, I've tried different quality
> parameters to try to improve it. Also, neither method is able to
> decompress something output by the other.
>
> In fact, on a large text document of say 3Mb, I found that UTL Compress
> would compress it to roughly half the size, whilst I would get a ratio
> of around 8:1 with the Java Deflater class.
>
> Both techniques say that they implement Limpel-Ziv as in the zlib
> library, and both are apparently compatible with gzip.
>
> I tried a further test, outputing the compressed BLOBS to files and
> trying to gunzip them (with -n option), but to no avail. The UTL
> Compressed file wasn't recognized at all and the Java one seemed to be
> missing information. Now, this is probably due to missing or
> incompatible headers, as zlib headers are not the same as gzip headers.
> I also gziped the original uncompressed text, and it came out very
> similar in size to that of the file compressed by Java.
>
> So, let's repeat my question!
>
> Does anybody know a reason why UTL_COMPRESS would compress to an almost
> 2:1 ratio?
>
> One might explain it by saying that gzip compresses better than, for
> instance, 'compress', the function provided by the zlib library, but for
> a comparison of techniques, the following page shows that it doesn't
> explain the large difference I'm seeing here:
> http://www.cs.rit.edu/~std3246/thesis/node36.html

The first and only thing that occurs to me while reading your post is your use of a syntax that produces the lowest level of compression.

utl_compress.lz_compress(uncompressed, 1);

Had you not used "1" the default level would be 6.

Here's what the doc says:
quality is an optional compression tuning value. It allows the UTL_COMPRESS user to choose between speed and compression quality,

meaning the percentage of reduction in size. A faster compression speed 
will result in less compression of the data. A slower compression speed 
will result in more compression of the data. Valid values are [1..9], 
with 1=fastest and 9=slowest. The default 'quality' value is 6.

Try the default and 9 and let us know what happens.

-- 
Daniel A. Morgan
University of Washington
damorgan_at_x.washington.edu
(replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org
Received on Tue Jul 04 2006 - 10:52:32 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US