org.apache.commons.codec.language

Class RefinedSoundex

Implemented Interfaces:
Encoder, StringEncoder

public class RefinedSoundex
extends Object
implements StringEncoder

Encodes a string into a Refined Soundex value. A refined soundex code is optimized for spell checking words. Soundex method originally developed by Margaret Odell and Robert Russell.

Version:
$Id: RefinedSoundex.java,v 1.21 2004/06/05 18:32:04 ggregory Exp $

Author:
Apache Software Foundation

Nested Class Summary

Field Summary

static RefinedSoundex
US_ENGLISH
This static variable contains an instance of the RefinedSoundex using the US_ENGLISH mapping.
static char[]
US_ENGLISH_MAPPING
RefinedSoundex is *refined* for a number of reasons one being that the mappings have been altered.
private char[]
soundexMapping
Every letter of the alphabet is "mapped" to a numerical value.

Constructor Summary

RefinedSoundex()
Creates an instance of the RefinedSoundex object using the default US English mapping.
RefinedSoundex(char[] mapping)
Creates a refined soundex instance using a custom mapping.

Method Summary

int
difference(java.lang.String s1, java.lang.String s2)
Returns the number of characters in the two encoded Strings that are the same.
Object
encode(Object pObject)
Encodes an Object using the refined soundex algorithm.
String
encode(java.lang.String pString)
Encodes a String using the refined soundex algorithm.
(package private) char
getMappingCode(char c)
Returns the mapping code for a given character.
String
soundex(java.lang.String str)
Retreives the Refined Soundex code for a given String object.

Field Details

US_ENGLISH

public static final RefinedSoundex US_ENGLISH
This static variable contains an instance of the RefinedSoundex using the US_ENGLISH mapping.


US_ENGLISH_MAPPING

public static final char[] US_ENGLISH_MAPPING
RefinedSoundex is *refined* for a number of reasons one being that the mappings have been altered. This implementation contains default mappings for US English.


soundexMapping

private char[] soundexMapping
Every letter of the alphabet is "mapped" to a numerical value. This char array holds the values to which each letter is mapped. This implementation contains a default map for US_ENGLISH

Constructor Details

RefinedSoundex

public RefinedSoundex()
Creates an instance of the RefinedSoundex object using the default US English mapping.


RefinedSoundex

public RefinedSoundex(char[] mapping)
Creates a refined soundex instance using a custom mapping. This constructor can be used to customize the mapping, and/or possibly provide an internationalized mapping for a non-Western character set.

Parameters:
mapping - Mapping array to use when finding the corresponding code for a given character

Method Details

difference

public int difference(java.lang.String s1,
                      java.lang.String s2)
            throws EncoderException
Returns the number of characters in the two encoded Strings that are the same. This return value ranges from 0 to the length of the shortest encoded String: 0 indicates little or no similarity, and 4 out of 4 (for example) indicates strong similarity or identical values. For refined Soundex, the return value can be greater than 4.

Parameters:
s1 - A String that will be encoded and compared.
s2 - A String that will be encoded and compared.

Returns:
The number of characters in the two encoded Strings that are the same from 0 to to the length of the shortest encoded String.

Throws:
EncoderException - if an error occurs encoding one of the strings

Since:
1.3

See Also:
SoundexUtils.difference(StringEncoder,String,String), MS T-SQL DIFFERENCE


encode

public Object encode(Object pObject)
            throws EncoderException
Encodes an Object using the refined soundex algorithm. This method is provided in order to satisfy the requirements of the Encoder interface, and will throw an EncoderException if the supplied object is not of type java.lang.String.
Specified by:
encode in interface Encoder

Parameters:
pObject - Object to encode

Returns:
An object (or type java.lang.String) containing the refined soundex code which corresponds to the String supplied.

Throws:
EncoderException - if the parameter supplied is not of type java.lang.String


encode

public String encode(java.lang.String pString)
Encodes a String using the refined soundex algorithm.
Specified by:
encode in interface StringEncoder

Parameters:
pString - A String object to encode

Returns:
A Soundex code corresponding to the String supplied


getMappingCode

(package private)  char getMappingCode(char c)
Returns the mapping code for a given character. The mapping codes are maintained in an internal char array named soundexMapping, and the default values of these mappings are US English.

Parameters:
c - char to get mapping for

Returns:
A character (really a numeral) to return for the given char


soundex

public String soundex(java.lang.String str)
Retreives the Refined Soundex code for a given String object.

Parameters:
str - String to encode using the Refined Soundex algorithm

Returns:
A soundex code for the String supplied


commons-codec version 1.3 - Copyright © 2002-2004 - Apache Software Foundation