Soundex Algorithm
The soundex algorithm is used to find the misspelled names. While searching the name of a person, we usually make mistakes. For instance the name Jurafsky can be misspelled like Jarofsky, Jarovsky, and Jarovski. By using the soundex algorithm, we can find all the modified forms of the name Jurafsky. This algorithms is mostly used in libraries, particularly for English names.
Steps for Soundex Algorithm
Step 1: Keep the initial letter or character as it is. Drop the following letters if they are occurring at start of the word
a,e,i,o,u,y,w,h
Step 2: Replace the remaining letters with digits as under. Please keep the first letter unchanged.
b, f, p, v →1
c, g, j, k, q, s, x, z →2
d, t → 3
l → 4
m, n →5
r →6
Step 3: if the same number is repeated consecutively. keep just one occurrence and delete the rest.
Step 4: The usual format for Soundex is Letter Digit Digit Digit i.e.( the first letter followed by three digits). If the soundex code as more than three digits, we delete the extra digits and add trailing zeros if there are less than three digits.
Example:
Name :Jurafsky
Step 1: Jrfsk
Step 2: J6122 // mapping from letter to digits
Step 3: J612 // deleting the consecutively repeated digits
Name :Jarofsky
Step 1: Jrfsk
Step 2: J6122 // mapping from letter to digits
Step 3: J612 // deleting the consecutively repeated digits
Name :Jarovsky
Step 1: Jrvsk
Step 2: J6122 // mapping from letter to digits
Step 3: J612 // deleting the consecutively repeated digits
Name :Jarovski
Step 1: Jrvsk
Step 2: J6122 // mapping from letter to digits
Step 3: J612 // deleting the consecutively repeated digits
Name: Bill
Step 1: Bll // removing vowels and y,w,h
Step 2:B44 // mapping from letters to digits
Step 3:B4 // removing consecutive repetition.
Step 4: B400 // Adding trailing zeros to get the format LetterDigitDigitDigit
Name: Clinton
Step 1: Clntn // removing vowels and y,w,h
Step 2:C4535 // mapping from letters to digits
Step 3: C453 // Removing extra digits to get the format LetterDigitDigitDigit
Usage in Databases.
The algorithm has been implemented in databse servers like MySQL and SQL servers etc. You can find and compare the soundex code by built in soundex function. Here is a sample MySQL statement to find the soundex code for any string.
SELECT SOUNDEX(‘Clinton’);