Re: Whatīs the algorithm that compresses a 20 digit big int, into 8 bytes ?

From: Doug <Doug.McMahon_at_oracle.com>
Date: Fri, 16 Apr 2010 12:43:42 -0700 (PDT)
Message-ID: <ab7f7206-c3b2-447d-b8ca-7b597b7efb74_at_x3g2000yqd.googlegroups.com>


On Apr 9, 1:41 pm, Rafael Anschau <rafael.ansc..._at_gmail.com> wrote:
> On Apr 9, 5:21 pm, Keith H Duggar <dug..._at_alum.mit.edu> wrote:
>
> > You might also want to google the "pigeon hole principle" for
> > some interesting related discussions.
>
> Hi Keith, I just read about it, and I am honored to have an answer
> from a MIT student.
>
> The insight I had was this(probably someone has refuted it already,
> but itīs still in my mind).
>
> Itīs possible to compress, using from the 256 possibilities of a byte
> 255 for values and one for concatenation. Assuming the concatenation
> byte as 11111111
>
> 1010110101111000111010111100010110101100011000011111111111111111111
>
> becomes
>
> 10101101011110001110101111000101101011000110000(11111111)(meaning
> +)10100(20 in binary)*(assuming anything that comes after the byte of
> frequency is the repeated digit)1.
>
> Of course, that would require making numbers without the byte of
> 11111111 which would result in a new way for representing binary
> numbers.But in this method, I donīt see any problems. It would also
> mean repetition is frequent enough in this universe, which I am not
> sure. It would also mean
> that the frequency is high enough so all 20 digits numbers could be
> placed there. Which seems very unlikely, so I accept it as proved
> based on the unlikeliness of it all.
>
> But getting back to earth, I guess databases just use 9 bytes for the
> last 82% of the numbers that eventually need to be stored.
>
> Thanks,

No possible compression could work if you have N possible input values, but only M possible output values, with M < N. This is obvious just by thinking about reversing the compression: with at most M distinct values in the compressed space, you can have at most M uncompressed values, meaning that some values from the original N can't be represented. Received on Fri Apr 16 2010 - 21:43:42 CEST

Original text of this message