Hashing for DISTINCT or GROUP BY in SQL

From: -CELKO- <jcelko212_at_earthlink.net>
Date: Tue, 12 Oct 2010 09:35:54 -0700 (PDT)
Message-ID: <ecfb8461-259a-4ad5-b4e9-8daf012ae3b8_at_g17g2000yqo.googlegroups.com>

In the old days, a DISTINCT or GROUP BY in an SQL engine were done by a sort. Back then we built early SQLs on top of existing file systems and we had pretty good sort procedures in the library. How many kids today have ever seen a polyphase merge sort on tape drives?

But today with parallel hardware and good hashing algorithms, would it be faster to use hashing to cluster equivalent classes of data together?

Obviously, if two data values are equal, they will have the same hash for all hashing functions. But two different values can have the same hash for any one hashing function.

Does there exist a set of hashing functions, H1(), H2(), .., Hn() which will produce at least one different result for any pair of data values? Received on Tue Oct 12 2010 - 18:35:54 CEST

This message: [ Message body ]
Next message: Erwin: "Re: Hashing for DISTINCT or GROUP BY in SQL"
Previous message: Spike: "i18n (multi language) schema"
Next in thread: Erwin: "Re: Hashing for DISTINCT or GROUP BY in SQL"
Reply: Erwin: "Re: Hashing for DISTINCT or GROUP BY in SQL"
Reply: Razvan Socol: "Re: Hashing for DISTINCT or GROUP BY in SQL"
Reply: Tegiri Nenashi: "Re: Hashing for DISTINCT or GROUP BY in SQL"
Reply: Roy Hann: "Re: Hashing for DISTINCT or GROUP BY in SQL"
Reply: Clifford Heath: "Re: Hashing for DISTINCT or GROUP BY in SQL"
Maybe reply: -CELKO-: "Re: Hashing for DISTINCT or GROUP BY in SQL"
Reply: paul c: "Re: Hashing for DISTINCT or GROUP BY in SQL"
Maybe reply: -CELKO-: "Re: Hashing for DISTINCT or GROUP BY in SQL"
Maybe reply: Razvan Socol: "Re: Hashing for DISTINCT or GROUP BY in SQL"
Reply: Sampo Syreeni: "Re: Hashing for DISTINCT or GROUP BY in SQL"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message