Optimal indexing of database costs only 2.33 bits/element (linear not n log(n) )

From: <dudajar_at_gmail.com>
Date: Wed, 4 Jul 2012 08:41:11 -0700 (PDT)
Message-ID: <c1e7c049-f53a-4690-900b-e14ec90e32a8_at_googlegroups.com>



We would think that the amount of information required to distinguish elements by some unique identifiers grows with n log(n) ... but it turns out that it can be compressed using linearly growing amount of information. The trick is that we don't need the information about the order of these identifiers, saving log(n!)~ nlog(n) bits of information. Specifically, to encode minimal prefix tree required to distinguish hash values of elements, there is asymptotically required 2.77 bits/elements and it can further reduced to 2.33 bits/element: http://arxiv.org/abs/1206.4555

Where such extremely thrifty encoding might be useful? Received on Wed Jul 04 2012 - 17:41:11 CEST

Original text of this message