Re: Opinions on SOUNDEX function PLEASE !!!

From: Karl Brandner <brandner_at_rzm.med.uni-muenchen.de>
Date: 1996/01/19
Message-ID: <4do4uf$63n_at_sparcserver.lrz-muenchen.de>#1/1


Hello Steve,

The SOUNDEX function is based on an algorithm which was developed in 1918. It is explained in 'The Art of Computer Programming, Volume 3, Sorting and Searching, by Donald E. Knuth, chapter six, pages 389-393.

The algorithm:

  1. Retain the first letter of the name, and drop all occurrences of a,e,h,i,o,u,w,y
  2. Assign the following numbers to the remaining letters after the first: b,f,p,v --> 1 c,g,j,k,q,s,x,z --> 2 d,t --> 3 l --> 4 m,n --> 5 r --> 6
  3. If two or more letters with the same code were adjacent in the original name (before step 1), omit all but the first. 4. Convert to the form 'letter,digit,digit,digit' by adding trailing zeros (if there are less than three digits), or by dropping rightmost digits (if there are more than three).

For example :

EULER --> E460 (and ELLERY --> E460 !!!) HILBERT --> H416 (and HEILBRONN --> H416 !!!) Lloyd --> L300 (and Ladd --> L300 !!!)

But: Rogers and Rodgers , or Sinclair and St. Clair remain separate !

If you have a large Database and you will search only with the SOUNDEX code and no other selection-criteria is used you will have a problem.

If you have other conditions in your SQL like birthday, ... you can try it.

Be aware: Each phonetical encoding depends on the language it was developed for. We use our own "SOUNDEX"-function because the method of SOUNDEX isn't very good for the german language.

Hope it helps,

Bye Karl   Received on Fri Jan 19 1996 - 00:00:00 CET

Original text of this message