Re: Hashing for DISTINCT or GROUP BY in SQL

From: Clifford Heath <no_at_spam.please.net>
Date: Wed, 13 Oct 2010 18:07:53 +1000
Message-ID: <4cb4dc1b$0$316$afc38c87_at_news.optusnet.com.au>


-CELKO- wrote:
> But two different values can have the same
> hash for any one hashing function.

So use the hash to reduce the number of key values you have to compare normally.

> Does there exist a set of hashing functions, H1(), H2(), .., Hn()
> which will produce at least one different result for any pair of data
> values?

A set of such hash functions, if you concatenate their values, would be just one hash function. Unless the result contains all the entropy that was in the parameters, you'll get collisions. If it does contain all the entropy, there's no advantage in hashing.

I'm sure I've seen query plans back as far as MS SQL Server 2000 that use a hash for grouping/distinct, but my memory may have failed me.

Clifford Heath. Received on Wed Oct 13 2010 - 10:07:53 CEST

Original text of this message