Main Page | Class Hierarchy | Alphabetical List | Data Structures | Directories | File List | Data Fields | Globals | Related Pages

Normalizer Class Reference

C++ API: Unicode Normalization. More...

#include <normlzr.h>


Public Types

enum  { DONE = 0xffff }
 If DONE is returned from an iteration function that returns a code point, then there are no more normalization results available. More...
enum  { COMPAT_BIT = 1, DECOMP_BIT = 2, COMPOSE_BIT = 4, FCD_BIT = 8 }
 This tells us what the bits in the "mode" mean. More...
enum  EMode {
  NO_OP = 0, COMPOSE = COMPOSE_BIT, COMPOSE_COMPAT = COMPOSE_BIT | COMPAT_BIT, DECOMP = DECOMP_BIT,
  DECOMP_COMPAT = DECOMP_BIT | COMPAT_BIT, FCD = FCD_BIT
}
 The mode of a Normalizer object. More...
enum  { IGNORE_HANGUL = 0x001 }
 The options for a Normalizer object. More...

Public Member Functions

 Normalizer (const UnicodeString &str, UNormalizationMode mode)
 Creates a new Normalizer object for iterating over the normalized form of a given string.
 Normalizer (const UChar *str, int32_t length, UNormalizationMode mode)
 Creates a new Normalizer object for iterating over the normalized form of a given string.
 Normalizer (const CharacterIterator &iter, UNormalizationMode mode)
 Creates a new Normalizer object for iterating over the normalized form of the given text.
 Normalizer (const Normalizer &copy)
 Copy constructor.
 ~Normalizer ()
 Destructor .
UChar32 current (void)
 Return the current character in the normalized text.
UChar32 first (void)
 Return the first character in the normalized text.
UChar32 last (void)
 Return the last character in the normalized text.
UChar32 next (void)
 Return the next character in the normalized text.
UChar32 previous (void)
 Return the previous character in the normalized text.
UChar32 setIndex (int32_t index)
 Set the iteration position in the input text that is being normalized and return the first normalized character at that position.
void setIndexOnly (int32_t index)
 Set the iteration position in the input text that is being normalized, without any immediate normalization.
void reset (void)
 Reset the index to the beginning of the text.
int32_t getIndex (void) const
 Retrieve the current iteration position in the input text that is being normalized.
int32_t startIndex (void) const
 Retrieve the index of the start of the input text.
int32_t endIndex (void) const
 Retrieve the index of the end of the input text.
UBool operator== (const Normalizer &that) const
 Returns TRUE when both iterators refer to the same character in the same input text.
UBool operator!= (const Normalizer &that) const
 Returns FALSE when both iterators refer to the same character in the same input text.
Normalizerclone (void) const
 Returns a pointer to a new Normalizer that is a clone of this one.
int32_t hashCode (void) const
 Generates a hash code for this iterator.
void setMode (UNormalizationMode newMode)
 Set the normalization mode for this object.
UNormalizationMode getUMode (void) const
 Return the normalization mode for this object.
void setOption (int32_t option, UBool value)
 Set options that affect this Normalizer's operation.
UBool getOption (int32_t option) const
 Determine whether an option is turned on or off.
void setText (const UnicodeString &newText, UErrorCode &status)
 Set the input text over which this Normalizer will iterate.
void setText (const CharacterIterator &newText, UErrorCode &status)
 Set the input text over which this Normalizer will iterate.
void setText (const UChar *newText, int32_t length, UErrorCode &status)
 Set the input text over which this Normalizer will iterate.
void getText (UnicodeString &result)
 Copies the input text into the UnicodeString argument.
 Normalizer (const UnicodeString &str, EMode mode)
 Creates a new Normalizer object for iterating over the normalized form of a given string.
 Normalizer (const UnicodeString &str, EMode mode, int32_t opt)
 Creates a new Normalizer object for iterating over the normalized form of a given string.
 Normalizer (const UChar *str, int32_t length, EMode mode)
 Creates a new Normalizer object for iterating over the normalized form of a given UChar string.
 Normalizer (const UChar *str, int32_t length, EMode mode, int32_t option)
 Creates a new Normalizer object for iterating over the normalized form of a given UChar string.
 Normalizer (const CharacterIterator &iter, EMode mode)
 Creates a new Normalizer object for iterating over the normalized form of the given text.
 Normalizer (const CharacterIterator &iter, EMode mode, int32_t opt)
 Creates a new Normalizer object for iterating over the normalized form of the given text.
void setMode (EMode newMode)
 Set the normalization mode for this object.
EMode getMode (void) const
 Return the basic operation performed by this Normalizer.

Static Public Member Functions

static void normalize (const UnicodeString &source, UNormalizationMode mode, int32_t options, UnicodeString &result, UErrorCode &status)
 Normalizes a UnicodeString according to the specified normalization mode.
static void compose (const UnicodeString &source, UBool compat, int32_t options, UnicodeString &result, UErrorCode &status)
 Compose a UnicodeString.
static void decompose (const UnicodeString &source, UBool compat, int32_t options, UnicodeString &result, UErrorCode &status)
 Static method to decompose a UnicodeString.
static UNormalizationCheckResult quickCheck (const UnicodeString &source, UNormalizationMode mode, UErrorCode &status)
 Performing quick check on a string, to quickly determine if the string is in a particular normalization format.
static UnicodeStringconcatenate (UnicodeString &left, UnicodeString &right, UnicodeString &result, UNormalizationMode mode, int32_t options, UErrorCode &errorCode)
static void normalize (const UnicodeString &source, EMode mode, int32_t options, UnicodeString &result, UErrorCode &status)
 Normalizes a UnicodeString using the given normalization operation.
static UNormalizationCheckResult quickCheck (const UnicodeString &source, EMode mode, UErrorCode &status)
 Performing quick check on a string, to quickly determine if the string is in a particular normalization format.
static UNormalizationMode getUNormalizationMode (EMode mode, UErrorCode &status)
 Converts C's Normalizer::EMode to UNormalizationMode.
static EMode getNormalizerEMode (UNormalizationMode mode, UErrorCode &status)
 Converts C++'s UNormalizationMode to Normalizer::EMode.


Detailed Description

C++ API: Unicode Normalization.

The Normalizer class consists of two parts:

The static functions are basically wrappers around the C implementation, using UnicodeString instead of UChar*. For basic information about normalization forms and details about the C API please see the documentation in unorm.h.

The iterator API with the Normalizer constructors and the non-static functions uses a CharacterIterator as input. It is possible to pass a string which is then internally wrapped in a CharacterIterator. The input text is not normalized all at once, but incrementally where needed (providing efficient random access). This allows to pass in a large text but spend only a small amount of time normalizing a small part of that text. However, if the entire text is normalized, then the iterator will be slower than normalizing the entire text at once and iterating over the result. A possible use of the Normalizer iterator is also to report an index into the original text that is close to where the normalized characters come from.

Important: The iterator API was cleaned up significantly for ICU 2.0. The earlier implementation reported the getIndex() inconsistently, and previous() could not be used after setIndex(), next(), first(), and current().

Normalizer allows to start normalizing from anywhere in the input text by calling setIndexOnly(), setIndex(), first(), or last(). Without calling any of these, the iterator will start at the beginning of the text.

At any time, next() returns the next normalized code point (UChar32), with post-increment semantics (like CharacterIterator::next32PostInc()). previous() returns the previous normalized code point (UChar32), with pre-decrement semantics (like CharacterIterator::previous32()).

current() and setIndex() return the current code point (respectively the one at the newly set index) without moving the getIndex(). Note that if the text at the current position needs to be normalized, then these functions will do that. (This is why current() is not const.) If you call setIndex() and then previous() then you normalize a piece of text (and get a code point from setIndex()) that you probably do not need. It is more efficient to call setIndexOnly() instead, which does not normalize.

getIndex() always refers to the position in the input text where the normalized code points are returned from. It does not always change with each returned code point. The code point that is returned from any of the functions corresponds to text at or after getIndex(), according to the function's iteration semantics (post-increment or pre-decrement).

next() returns a code point from at or after the getIndex() from before the next() call. After the next() call, the getIndex() might have moved to where the next code point will be returned from (from a next() or current() call). This is semantically equivalent to array access with array[index++] (post-increment semantics).

previous() returns a code point from at or after the getIndex() from after the previous() call. This is semantically equivalent to array access with array[--index] (pre-decrement semantics).

Internally, the Normalizer iterator normalizes a small piece of text starting at the getIndex() and ending at a following "safe" index. The normalized results is stored in an internal string buffer, and the code points are iterated from there. With multiple iteration calls, this is repeated until the next piece of text needs to be normalized, and the getIndex() needs to be moved.

The following "safe" index, the internal buffer, and the secondary iteration index into that buffer are not exposed on the API. This also means that it is currently not practical to return to a particular, arbitrary position in the text because one would need to know, and be able to set, in addition to the getIndex(), at least also the current index into the internal buffer. It is currently only possible to observe when getIndex() changes (with careful consideration of the iteration semantics), at which time the internal index will be 0. For example, if getIndex() is different after next() than before it, then the internal index is 0 and one can return to this getIndex() later with setIndexOnly().

Author:
Laura Werner, Mark Davis, Markus Scherer ICU 2.0


Member Enumeration Documentation

anonymous enum
 

If DONE is returned from an iteration function that returns a code point, then there are no more normalization results available.

anonymous enum
 

This tells us what the bits in the "mode" mean.

Deprecated:
To be removed after 2002-sep-30.
Use UNormalizationMode.

anonymous enum
 

The options for a Normalizer object.

Enumeration values:
IGNORE_HANGUL  Option to disable Hangul/Jamo composition and decomposition.

This option applies to Korean text, which can be represented either in the Jamo alphabet or in Hangul characters, which are really just two or three Jamo combined into one visual glyph. Since Jamo takes up more storage space than Hangul, applications that process only Hangul text may wish to turn this option on when decomposing text.

The Unicode standard treates Hangul to Jamo conversion as a canonical decomposition, so this option must be turned off if you wish to transform strings into one of the standard Unicode Normalization Forms.

See also:
setOption
Deprecated:
To be removed (or moved to private for documentation) after 2002-aug-31.
Obsolete option.

enum Normalizer::EMode
 

The mode of a Normalizer object.

Deprecated:
To be removed after 2002-sep-30.
Use UNormalizationMode.
Enumeration values:
NO_OP  Null operation for use with the ructors and the static normalize method.

This value tells the Normalizer to do nothing but return unprocessed characters from the underlying UnicodeString or CharacterIterator. If you have code which requires raw text at some times and normalized text at others, you can use NO_OP for the cases where you want raw text, rather than having a separate code path that bypasses Normalizer altogether.

See also:
setMode
Deprecated:
To be removed after 2002-sep-30.
Use UNORM_NONE from UNormalizationMode.
COMPOSE  Canonical decomposition followed by canonical composition.

Used with the ructors and the static normalize method to determine the operation to be performed.

If all optional features (e.g. IGNORE_HANGUL) are turned off, this operation produces output that is in Unicode Canonical Form C.

See also:
setMode
Deprecated:
To be removed after 2002-sep-30.
Use UNORM_NFC from UNormalizationMode.
COMPOSE_COMPAT  Compatibility decomposition followed by canonical composition.

Used with the ructors and the static normalize method to determine the operation to be performed.

If all optional features (e.g. IGNORE_HANGUL) are turned off, this operation produces output that is in Unicode Canonical Form KC.

See also:
setMode
Deprecated:
To be removed after 2002-sep-30.
Use UNORM_NFKC from UNormalizationMode.
DECOMP  Canonical decomposition.

This value is passed to the ructors and the static normalize method to determine the operation to be performed.

If all optional features (e.g. IGNORE_HANGUL) are turned off, this operation produces output that is in Unicode Canonical Form D.

See also:
setMode
Deprecated:
To be removed after 2002-sep-30.
Use UNORM_NFD from UNormalizationMode.
DECOMP_COMPAT  Compatibility decomposition.

This value is passed to the ructors and the static normalize method to determine the operation to be performed.

If all optional features (e.g. IGNORE_HANGUL) are turned off, this operation produces output that is in Unicode Canonical Form KD.

See also:
setMode
Deprecated:
To be removed after 2002-sep-30.
Use UNORM_NFKD from UNormalizationMode.
FCD 
Deprecated:
To be removed after 2002-sep-30.
Use UNORM_FCD from UNormalizationMode.


Constructor & Destructor Documentation

Normalizer::Normalizer const UnicodeString str,
UNormalizationMode  mode
 

Creates a new Normalizer object for iterating over the normalized form of a given string.

Parameters:
str The string to be normalized. The normalization will start at the beginning of the string.
mode The normalization mode. ICU 2.0

Normalizer::Normalizer const UChar str,
int32_t  length,
UNormalizationMode  mode
 

Creates a new Normalizer object for iterating over the normalized form of a given string.

Parameters:
str The string to be normalized. The normalization will start at the beginning of the string.
length Length of the string, or -1 if NUL-terminated.
mode The normalization mode. ICU 2.0

Normalizer::Normalizer const CharacterIterator iter,
UNormalizationMode  mode
 

Creates a new Normalizer object for iterating over the normalized form of the given text.

Parameters:
iter The input text to be normalized. The normalization will start at the beginning of the string.
mode The normalization mode. ICU 2.0

Normalizer::Normalizer const Normalizer copy  ) 
 

Copy constructor.

Normalizer::Normalizer const UnicodeString str,
EMode  mode
 

Creates a new Normalizer object for iterating over the normalized form of a given string.

Parameters:
str The string to be normalized. The normalization will start at the beginning of the string.
mode The normalization mode.
Deprecated:
To be removed after 2002-sep-30.
Use UNormalizationMode.

Normalizer::Normalizer const UnicodeString str,
EMode  mode,
int32_t  opt
 

Creates a new Normalizer object for iterating over the normalized form of a given string.

The options parameter specifies which optional Normalizer features are to be enabled for this object.

Parameters:
str The string to be normalized. The normalization will start at the beginning of the string.
mode The normalization mode.
opt Any optional features to be enabled. Currently the only available option is IGNORE_HANGUL If you want the default behavior corresponding to one of the standard Unicode Normalization Forms, use 0 for this argument
Deprecated:
To be removed after 2002-sep-30.
Use UNormalizationMode.

Normalizer::Normalizer const UChar str,
int32_t  length,
EMode  mode
 

Creates a new Normalizer object for iterating over the normalized form of a given UChar string.

Parameters:
str The string to be normalized. The normalization will start at the beginning of the string.
length Lenght of the string
mode The normalization mode.
Deprecated:
To be removed after 2002-sep-30.
Use UNormalizationMode.

Normalizer::Normalizer const UChar str,
int32_t  length,
EMode  mode,
int32_t  option
 

Creates a new Normalizer object for iterating over the normalized form of a given UChar string.

Parameters:
str The string to be normalized. The normalization will start at the beginning of the string.
length Lenght of the string
mode The normalization mode.
opt Any optional features to be enabled. Currently the only available option is IGNORE_HANGUL If you want the default behavior corresponding to one of the standard Unicode Normalization Forms, use 0 for this argument

Normalizer::Normalizer const CharacterIterator iter,
EMode  mode
 

Creates a new Normalizer object for iterating over the normalized form of the given text.

Parameters:
iter The input text to be normalized. The normalization will start at the beginning of the string.
mode The normalization mode.
Deprecated:
To be removed after 2002-sep-30.
Use UNormalizationMode.

Normalizer::Normalizer const CharacterIterator iter,
EMode  mode,
int32_t  opt
 

Creates a new Normalizer object for iterating over the normalized form of the given text.

Parameters:
iter The input text to be normalized. The normalization will start at the beginning of the string.
mode The normalization mode.
opt Any optional features to be enabled. Currently the only available option is IGNORE_HANGUL If you want the default behavior corresponding to one of the standard Unicode Normalization Forms, use 0 for this argument
Deprecated:
To be removed after 2002-sep-30.
Use UNormalizationMode.


Member Function Documentation

Normalizer* Normalizer::clone void   )  const
 

Returns a pointer to a new Normalizer that is a clone of this one.

The caller is responsible for deleting the new clone.

static void Normalizer::compose const UnicodeString source,
UBool  compat,
int32_t  options,
UnicodeString result,
UErrorCode status
[static]
 

Compose a UnicodeString.

This is equivalent to normalize() with mode UNORM_NFC or UNORM_NFKC. This is a wrapper for unorm_normalize(), using UnicodeString's.

The options parameter specifies which optional Normalizer features are to be enabled for this operation. Currently the only available option is deprecated. If you want the default behavior corresponding to Unicode Normalization Form C or KC, use 0 for this argument.

Parameters:
source the string to be composed.
compat Perform compatibility decomposition before composition. If this argument is FALSE, only canonical decomposition will be performed.
options the optional features to be enabled (0 for no options)
result The composed string (on output).
status The error code.

UChar32 Normalizer::current void   ) 
 

Return the current character in the normalized text.

current() may need to normalize some text at getIndex(). The getIndex() is not changed.

Returns:
the current normalized code point ICU 2.0

static void Normalizer::decompose const UnicodeString source,
UBool  compat,
int32_t  options,
UnicodeString result,
UErrorCode status
[static]
 

Static method to decompose a UnicodeString.

This is equivalent to normalize() with mode UNORM_NFD or UNORM_NFKD. This is a wrapper for unorm_normalize(), using UnicodeString's.

The options parameter specifies which optional Normalizer features are to be enabled for this operation. Currently the only available option is deprecated. The desired options should be OR'ed together to determine the value of this argument. If you want the default behavior corresponding to Unicode Normalization Form D or KD, use 0 for this argument.

Parameters:
source the string to be decomposed.
compat Perform compatibility decomposition. If this argument is FALSE, only canonical decomposition will be performed.
options the optional features to be enabled (0 for no options)
result The decomposed string (on output).
status The error code.

int32_t Normalizer::endIndex void   )  const
 

Retrieve the index of the end of the input text.

This is the end index of the CharacterIterator or the length of the string over which this Normalizer is iterating. This end index is exclusive, i.e., the Normalizer operates only on characters before this index.

Returns:
the first index in the input text where the Normalizer does not operate

UChar32 Normalizer::first void   ) 
 

Return the first character in the normalized text.

This is equivalent to setIndexOnly(startIndex()) followed by next(). (Post-increment semantics.)

Returns:
the first normalized code point ICU 2.0

int32_t Normalizer::getIndex void   )  const
 

Retrieve the current iteration position in the input text that is being normalized.

A following call to next() will return a normalized code point from the input text at or after this index.

After a call to previous(), getIndex() will point at or before the position in the input text where the normalized code point was returned from with previous().

Returns:
the current index in the input text

Normalizer::EMode Normalizer::getMode void   )  const [inline]
 

Return the basic operation performed by this Normalizer.

See also:
setMode
Deprecated:
To be removed after 2002-sep-30.
Use UNormalizationMode.

Normalizer::EMode Normalizer::getNormalizerEMode UNormalizationMode  mode,
UErrorCode status
[inline, static]
 

Converts C++'s UNormalizationMode to Normalizer::EMode.

Parameters:
mode member of the enum UNormalizationMode
status error codes status
Returns:
Normalizer::EMode equivalent of UNormalizationMode
Deprecated:
To be removed after 2002-sep-30.
Use UNormalizationMode.

UBool Normalizer::getOption int32_t  option  )  const
 

Determine whether an option is turned on or off.

If multiple options are specified, then the result is TRUE if any of them are set.

Parameters:
option the option(s) that are to be checked
Returns:
TRUE if any of the option(s) are set
See also:
setOption

void Normalizer::getText UnicodeString result  ) 
 

Copies the input text into the UnicodeString argument.

Parameters:
result Receives a copy of the text under iteration.

UNormalizationMode Normalizer::getUMode void   )  const
 

Return the normalization mode for this object.

This is an unusual name because there used to be a getMode() that returned a different type.

Returns:
the mode for this Normalizer
See also:
setMode ICU 2.0

UNormalizationMode Normalizer::getUNormalizationMode EMode  mode,
UErrorCode status
[inline, static]
 

Converts C's Normalizer::EMode to UNormalizationMode.

Parameters:
mode member of the enum Normalizer::EMode
status error codes status
Returns:
UNormalizationMode equivalent of Normalizer::EMode
Deprecated:
To be removed after 2002-sep-30.
Use UNormalizationMode.

int32_t Normalizer::hashCode void   )  const
 

Generates a hash code for this iterator.

Returns:
the hash code

UChar32 Normalizer::last void   ) 
 

Return the last character in the normalized text.

This is equivalent to setIndexOnly(endIndex()) followed by previous(). (Pre-decrement semantics.)

Returns:
the last normalized code point ICU 2.0

UChar32 Normalizer::next void   ) 
 

Return the next character in the normalized text.

(Post-increment semantics.) If the end of the text has already been reached, DONE is returned.

Returns:
the next normalized code point ICU 2.0

void Normalizer::normalize const UnicodeString source,
EMode  mode,
int32_t  options,
UnicodeString result,
UErrorCode status
[inline, static]
 

Normalizes a UnicodeString using the given normalization operation.

The options parameter specifies which optional Normalizer features are to be enabled for this operation. Currently the only available option is IGNORE_HANGUL. If you want the default behavior corresponding to one of the standard Unicode Normalization Forms, use 0 for this argument.

Parameters:
source the input string to be normalized.
aMode the normalization mode
options the optional features to be enabled.
result The normalized string (on output).
status The error code.
Deprecated:
To be removed after 2002-sep-30.
Use UNormalizationMode.

static void Normalizer::normalize const UnicodeString source,
UNormalizationMode  mode,
int32_t  options,
UnicodeString result,
UErrorCode status
[static]
 

Normalizes a UnicodeString according to the specified normalization mode.

This is a wrapper for unorm_normalize(), using UnicodeString's.

The options parameter specifies which optional Normalizer features are to be enabled for this operation. Currently the only available option is deprecated. If you want the default behavior corresponding to one of the standard Unicode Normalization Forms, use 0 for this argument.

Parameters:
source the input string to be normalized.
mode the normalization mode
options the optional features to be enabled (0 for no options)
result The normalized string (on output).
status The error code. ICU 2.0

UBool Normalizer::operator!= const Normalizer that  )  const [inline]
 

Returns FALSE when both iterators refer to the same character in the same input text.

Parameters:
that a Normalizer object to compare this one to
Returns:
comparison result

UBool Normalizer::operator== const Normalizer that  )  const
 

Returns TRUE when both iterators refer to the same character in the same input text.

Parameters:
that a Normalizer object to compare this one to
Returns:
comparison result

UChar32 Normalizer::previous void   ) 
 

Return the previous character in the normalized text.

and decrement (Pre-decrement semantics.) If the beginning of the text has already been reached, DONE is returned.

Returns:
the previous normalized code point ICU 2.0

UNormalizationCheckResult Normalizer::quickCheck const UnicodeString source,
EMode  mode,
UErrorCode status
[inline, static]
 

Performing quick check on a string, to quickly determine if the string is in a particular normalization format.

Three types of result can be returned UNORM_YES, UNORM_NO or UNORM_MAYBE. Result UNORM_YES indicates that the argument string is in the desired normalized format, UNORM_NO determines that argument string is not in the desired normalized format. A UNORM_MAYBE result indicates that a more thorough check is required, the user may have to put the string in its normalized form and compare the results.

Parameters:
source string for determining if it is in a normalized format mode normalization format
status A pointer to an UErrorCode to receive any errors
Returns:
UNORM_YES, UNORM_NO or UNORM_MAYBE
Deprecated:
To be removed after 2002-sep-30.
Use UNormalizationMode.

static UNormalizationCheckResult Normalizer::quickCheck const UnicodeString source,
UNormalizationMode  mode,
UErrorCode status
[static]
 

Performing quick check on a string, to quickly determine if the string is in a particular normalization format.

This is a wrapper for unorm_quickCheck(), using a UnicodeString.

Three types of result can be returned UNORM_YES, UNORM_NO or UNORM_MAYBE. Result UNORM_YES indicates that the argument string is in the desired normalized format, UNORM_NO determines that argument string is not in the desired normalized format. A UNORM_MAYBE result indicates that a more thorough check is required, the user may have to put the string in its normalized form and compare the results.

Parameters:
source string for determining if it is in a normalized format mode normalization format
status A pointer to a UErrorCode to receive any errors
Returns:
UNORM_YES, UNORM_NO or UNORM_MAYBE ICU 2.0

void Normalizer::reset void   ) 
 

Reset the index to the beginning of the text.

This is equivalent to setIndexOnly(startIndex)).

UChar32 Normalizer::setIndex int32_t  index  ) 
 

Set the iteration position in the input text that is being normalized and return the first normalized character at that position.

This is equivalent to setIndexOnly() followed by current(). After setIndex(), getIndex() will return the same index that is specified here.

Note that setIndex() normalizes some text starting at the specified index and returns the first code point from that normalization. If the next call is to previous() then this piece of text probably did not need to be normalized.

This function is deprecated. It is recommended to use setIndexOnly() instead of setIndex().

Parameters:
index the desired index in the input text.
Returns:
the normalized character from the text at index
Deprecated:
To be removed after 2002-aug-31.
Use setIndexOnly().

void Normalizer::setIndexOnly int32_t  index  ) 
 

Set the iteration position in the input text that is being normalized, without any immediate normalization.

After setIndexOnly(), getIndex() will return the same index that is specified here.

Parameters:
index the desired index in the input text. ICU 2.0

void Normalizer::setMode EMode  newMode  )  [inline]
 

Set the normalization mode for this object.

Note:If the normalization mode is changed while iterating over a string, calls to next and previous may return previously buffers characters in the old normalization mode until the iteration is able to re-sync at the next base character. It is safest to call setText(), first, last, etc. after calling setMode.

Parameters:
newMode the new mode for this Normalizer. The supported modes are:
See also:
getMode
Deprecated:
To be removed after 2002-sep-30.
Use UNormalizationMode.

void Normalizer::setMode UNormalizationMode  newMode  ) 
 

Set the normalization mode for this object.

Note:If the normalization mode is changed while iterating over a string, calls to next and previous may return previously buffers characters in the old normalization mode until the iteration is able to re-sync at the next base character. It is safest to call setIndexOnly, reset, setText(), first, last, etc. after calling setMode.

Parameters:
newMode the new mode for this Normalizer.
See also:
getUMode

void Normalizer::setOption int32_t  option,
UBool  value
 

Set options that affect this Normalizer's operation.

Options do not change the basic composition or decomposition operation that is being performed, but they control whether certain optional portions of the operation are done. Currently the only available option is deprecated.

It is possible to specify multiple options that are all turned on or off.

Parameters:
option the option(s) whose value is/are to be set.
value the new setting for the option. Use TRUE to turn the option(s) on and FALSE to turn it/them off.
See also:
getOption

void Normalizer::setText const UChar newText,
int32_t  length,
UErrorCode status
 

Set the input text over which this Normalizer will iterate.

The iteration position is set to the beginning.

Parameters:
newText a string that replaces the current input text
length the length of the string, or -1 if NUL-terminated
status a UErrorCode

void Normalizer::setText const CharacterIterator newText,
UErrorCode status
 

Set the input text over which this Normalizer will iterate.

The iteration position is set to the beginning.

Parameters:
newText a CharacterIterator object that replaces the current input text
status a UErrorCode

void Normalizer::setText const UnicodeString newText,
UErrorCode status
 

Set the input text over which this Normalizer will iterate.

The iteration position is set to the beginning.

Parameters:
newText a string that replaces the current input text
status a UErrorCode

int32_t Normalizer::startIndex void   )  const
 

Retrieve the index of the start of the input text.

This is the begin index of the CharacterIterator or the start (i.e. index 0) of the string over which this Normalizer is iterating.

Returns:
the smallest index in the input text where the Normalizer operates


The documentation for this class was generated from the following file:
Generated on Mon May 23 00:57:32 2005 for ICU 2.1 by  doxygen 1.4.2