ICU 65.1  65.1
Public Member Functions
icu::FilteredNormalizer2 Class Reference

Normalization filtered by a UnicodeSet. More...

#include <normalizer2.h>

Inheritance diagram for icu::FilteredNormalizer2:
icu::Normalizer2 icu::UObject icu::UMemory

Public Member Functions

 FilteredNormalizer2 (const Normalizer2 &n2, const UnicodeSet &filterSet)
 Constructs a filtered normalizer wrapping any Normalizer2 instance and a filter set. More...
 
 ~FilteredNormalizer2 ()
 Destructor. More...
 
virtual UnicodeStringnormalize (const UnicodeString &src, UnicodeString &dest, UErrorCode &errorCode) const U_OVERRIDE
 Writes the normalized form of the source string to the destination string (replacing its contents) and returns the destination string. More...
 
virtual void normalizeUTF8 (uint32_t options, StringPiece src, ByteSink &sink, Edits *edits, UErrorCode &errorCode) const U_OVERRIDE
 Normalizes a UTF-8 string and optionally records how source substrings relate to changed and unchanged result substrings. More...
 
virtual UnicodeStringnormalizeSecondAndAppend (UnicodeString &first, const UnicodeString &second, UErrorCode &errorCode) const U_OVERRIDE
 Appends the normalized form of the second string to the first string (merging them at the boundary) and returns the first string. More...
 
virtual UnicodeStringappend (UnicodeString &first, const UnicodeString &second, UErrorCode &errorCode) const U_OVERRIDE
 Appends the second string to the first string (merging them at the boundary) and returns the first string. More...
 
virtual UBool getDecomposition (UChar32 c, UnicodeString &decomposition) const U_OVERRIDE
 Gets the decomposition mapping of c. More...
 
virtual UBool getRawDecomposition (UChar32 c, UnicodeString &decomposition) const U_OVERRIDE
 Gets the raw decomposition mapping of c. More...
 
virtual UChar32 composePair (UChar32 a, UChar32 b) const U_OVERRIDE
 Performs pairwise composition of a & b and returns the composite if there is one. More...
 
virtual uint8_t getCombiningClass (UChar32 c) const U_OVERRIDE
 Gets the combining class of c. More...
 
virtual UBool isNormalized (const UnicodeString &s, UErrorCode &errorCode) const U_OVERRIDE
 Tests if the string is normalized. More...
 
virtual UBool isNormalizedUTF8 (StringPiece s, UErrorCode &errorCode) const U_OVERRIDE
 Tests if the UTF-8 string is normalized. More...
 
virtual UNormalizationCheckResult quickCheck (const UnicodeString &s, UErrorCode &errorCode) const U_OVERRIDE
 Tests if the string is normalized. More...
 
virtual int32_t spanQuickCheckYes (const UnicodeString &s, UErrorCode &errorCode) const U_OVERRIDE
 Returns the end of the normalized substring of the input string. More...
 
virtual UBool hasBoundaryBefore (UChar32 c) const U_OVERRIDE
 Tests if the character always has a normalization boundary before it, regardless of context. More...
 
virtual UBool hasBoundaryAfter (UChar32 c) const U_OVERRIDE
 Tests if the character always has a normalization boundary after it, regardless of context. More...
 
virtual UBool isInert (UChar32 c) const U_OVERRIDE
 Tests if the character is normalization-inert. More...
 
- Public Member Functions inherited from icu::Normalizer2
 ~Normalizer2 ()
 Destructor. More...
 
UnicodeString normalize (const UnicodeString &src, UErrorCode &errorCode) const
 Returns the normalized form of the source string. More...
 
- Public Member Functions inherited from icu::UObject
virtual ~UObject ()
 Destructor. More...
 
virtual UClassID getDynamicClassID () const
 ICU4C "poor man's RTTI", returns a UClassID for the actual ICU class. More...
 

Additional Inherited Members

- Static Public Member Functions inherited from icu::Normalizer2
static const Normalizer2getNFCInstance (UErrorCode &errorCode)
 Returns a Normalizer2 instance for Unicode NFC normalization. More...
 
static const Normalizer2getNFDInstance (UErrorCode &errorCode)
 Returns a Normalizer2 instance for Unicode NFD normalization. More...
 
static const Normalizer2getNFKCInstance (UErrorCode &errorCode)
 Returns a Normalizer2 instance for Unicode NFKC normalization. More...
 
static const Normalizer2getNFKDInstance (UErrorCode &errorCode)
 Returns a Normalizer2 instance for Unicode NFKD normalization. More...
 
static const Normalizer2getNFKCCasefoldInstance (UErrorCode &errorCode)
 Returns a Normalizer2 instance for Unicode NFKC_Casefold normalization. More...
 
static const Normalizer2getInstance (const char *packageName, const char *name, UNormalization2Mode mode, UErrorCode &errorCode)
 Returns a Normalizer2 instance which uses the specified data file (packageName/name similar to ucnv_openPackage() and ures_open()/ResourceBundle) and which composes or decomposes text according to the specified mode. More...
 

Detailed Description

Normalization filtered by a UnicodeSet.

Normalizes portions of the text contained in the filter set and leaves portions not contained in the filter set unchanged. Filtering is done via UnicodeSet::span(..., USET_SPAN_SIMPLE). Not-in-the-filter text is treated as "is normalized" and "quick check yes". This class implements all of (and only) the Normalizer2 API. An instance of this class is unmodifiable/immutable but is constructed and must be destructed by the owner.

Stable:
ICU 4.4

Definition at line 503 of file normalizer2.h.

Constructor & Destructor Documentation

◆ FilteredNormalizer2()

icu::FilteredNormalizer2::FilteredNormalizer2 ( const Normalizer2 n2,
const UnicodeSet filterSet 
)
inline

Constructs a filtered normalizer wrapping any Normalizer2 instance and a filter set.

Both are aliased and must not be modified or deleted while this object is used. The filter set should be frozen; otherwise the performance will suffer greatly.

Parameters
n2wrapped Normalizer2 instance
filterSetUnicodeSet which determines the characters to be normalized
Stable:
ICU 4.4

Definition at line 515 of file normalizer2.h.

References icu::Normalizer2::append(), icu::Normalizer2::composePair(), icu::Normalizer2::getCombiningClass(), icu::Normalizer2::getDecomposition(), icu::Normalizer2::getRawDecomposition(), icu::Normalizer2::hasBoundaryAfter(), icu::Normalizer2::hasBoundaryBefore(), icu::Normalizer2::isInert(), icu::Normalizer2::isNormalized(), icu::Normalizer2::isNormalizedUTF8(), icu::Normalizer2::normalize(), icu::Normalizer2::normalizeSecondAndAppend(), icu::Normalizer2::normalizeUTF8(), icu::Normalizer2::quickCheck(), icu::Normalizer2::spanQuickCheckYes(), and U_OVERRIDE.

◆ ~FilteredNormalizer2()

icu::FilteredNormalizer2::~FilteredNormalizer2 ( )

Destructor.

Stable:
ICU 4.4

Member Function Documentation

◆ append()

virtual UnicodeString& icu::FilteredNormalizer2::append ( UnicodeString first,
const UnicodeString second,
UErrorCode errorCode 
) const
virtual

Appends the second string to the first string (merging them at the boundary) and returns the first string.

The result is normalized if both the strings were normalized. The first and second strings must be different objects.

Parameters
firststring, should be normalized
secondstring, should be normalized
errorCodeStandard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)
Returns
first
Stable:
ICU 4.4

Implements icu::Normalizer2.

◆ composePair()

virtual UChar32 icu::FilteredNormalizer2::composePair ( UChar32  a,
UChar32  b 
) const
virtual

Performs pairwise composition of a & b and returns the composite if there is one.

For details see the base class documentation.

This function is independent of the mode of the Normalizer2.

Parameters
aA (normalization starter) code point.
bAnother code point.
Returns
The non-negative composite code point if there is one; otherwise a negative value.
Stable:
ICU 49

Reimplemented from icu::Normalizer2.

◆ getCombiningClass()

virtual uint8_t icu::FilteredNormalizer2::getCombiningClass ( UChar32  c) const
virtual

Gets the combining class of c.

The default implementation returns 0 but all standard implementations return the Unicode Canonical_Combining_Class value.

Parameters
ccode point
Returns
c's combining class
Stable:
ICU 49

Reimplemented from icu::Normalizer2.

◆ getDecomposition()

virtual UBool icu::FilteredNormalizer2::getDecomposition ( UChar32  c,
UnicodeString decomposition 
) const
virtual

Gets the decomposition mapping of c.

For details see the base class documentation.

This function is independent of the mode of the Normalizer2.

Parameters
ccode point
decompositionString object which will be set to c's decomposition mapping, if there is one.
Returns
TRUE if c has a decomposition, otherwise FALSE
Stable:
ICU 4.6

Implements icu::Normalizer2.

◆ getRawDecomposition()

virtual UBool icu::FilteredNormalizer2::getRawDecomposition ( UChar32  c,
UnicodeString decomposition 
) const
virtual

Gets the raw decomposition mapping of c.

For details see the base class documentation.

This function is independent of the mode of the Normalizer2.

Parameters
ccode point
decompositionString object which will be set to c's raw decomposition mapping, if there is one.
Returns
TRUE if c has a decomposition, otherwise FALSE
Stable:
ICU 49

Reimplemented from icu::Normalizer2.

◆ hasBoundaryAfter()

virtual UBool icu::FilteredNormalizer2::hasBoundaryAfter ( UChar32  c) const
virtual

Tests if the character always has a normalization boundary after it, regardless of context.

For details see the Normalizer2 base class documentation.

Parameters
ccharacter to test
Returns
TRUE if c has a normalization boundary after it
Stable:
ICU 4.4

Implements icu::Normalizer2.

◆ hasBoundaryBefore()

virtual UBool icu::FilteredNormalizer2::hasBoundaryBefore ( UChar32  c) const
virtual

Tests if the character always has a normalization boundary before it, regardless of context.

For details see the Normalizer2 base class documentation.

Parameters
ccharacter to test
Returns
TRUE if c has a normalization boundary before it
Stable:
ICU 4.4

Implements icu::Normalizer2.

◆ isInert()

virtual UBool icu::FilteredNormalizer2::isInert ( UChar32  c) const
virtual

Tests if the character is normalization-inert.

For details see the Normalizer2 base class documentation.

Parameters
ccharacter to test
Returns
TRUE if c is normalization-inert
Stable:
ICU 4.4

Implements icu::Normalizer2.

◆ isNormalized()

virtual UBool icu::FilteredNormalizer2::isNormalized ( const UnicodeString s,
UErrorCode errorCode 
) const
virtual

Tests if the string is normalized.

For details see the Normalizer2 base class documentation.

Parameters
sinput string
errorCodeStandard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)
Returns
TRUE if s is normalized
Stable:
ICU 4.4

Implements icu::Normalizer2.

◆ isNormalizedUTF8()

virtual UBool icu::FilteredNormalizer2::isNormalizedUTF8 ( StringPiece  s,
UErrorCode errorCode 
) const
virtual

Tests if the UTF-8 string is normalized.

Internally, in cases where the quickCheck() method would return "maybe" (which is only possible for the two COMPOSE modes) this method resolves to "yes" or "no" to provide a definitive result, at the cost of doing more work in those cases.

This works for all normalization modes, but it is currently optimized for UTF-8 only for "compose" modes, such as for NFC, NFKC, and NFKC_Casefold (UNORM2_COMPOSE and UNORM2_COMPOSE_CONTIGUOUS). For other modes it currently converts to UTF-16 and calls isNormalized().

Parameters
sUTF-8 input string
errorCodeStandard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)
Returns
TRUE if s is normalized
Stable:
ICU 60

Reimplemented from icu::Normalizer2.

◆ normalize()

virtual UnicodeString& icu::FilteredNormalizer2::normalize ( const UnicodeString src,
UnicodeString dest,
UErrorCode errorCode 
) const
virtual

Writes the normalized form of the source string to the destination string (replacing its contents) and returns the destination string.

The source and destination strings must be different objects.

Parameters
srcsource string
destdestination string; its contents is replaced with normalized src
errorCodeStandard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)
Returns
dest
Stable:
ICU 4.4

Implements icu::Normalizer2.

◆ normalizeSecondAndAppend()

virtual UnicodeString& icu::FilteredNormalizer2::normalizeSecondAndAppend ( UnicodeString first,
const UnicodeString second,
UErrorCode errorCode 
) const
virtual

Appends the normalized form of the second string to the first string (merging them at the boundary) and returns the first string.

The result is normalized if the first string was normalized. The first and second strings must be different objects.

Parameters
firststring, should be normalized
secondstring, will be normalized
errorCodeStandard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)
Returns
first
Stable:
ICU 4.4

Implements icu::Normalizer2.

◆ normalizeUTF8()

virtual void icu::FilteredNormalizer2::normalizeUTF8 ( uint32_t  options,
StringPiece  src,
ByteSink sink,
Edits edits,
UErrorCode errorCode 
) const
virtual

Normalizes a UTF-8 string and optionally records how source substrings relate to changed and unchanged result substrings.

Currently implemented completely only for "compose" modes, such as for NFC, NFKC, and NFKC_Casefold (UNORM2_COMPOSE and UNORM2_COMPOSE_CONTIGUOUS). Otherwise currently converts to & from UTF-16 and does not support edits.

Parameters
optionsOptions bit set, usually 0. See U_OMIT_UNCHANGED_TEXT and U_EDITS_NO_RESET.
srcSource UTF-8 string.
sinkA ByteSink to which the normalized UTF-8 result string is written. sink.Flush() is called at the end.
editsRecords edits for index mapping, working with styled text, and getting only changes (if any). The Edits contents is undefined if any error occurs. This function calls edits->reset() first unless options includes U_EDITS_NO_RESET. edits can be nullptr.
errorCodeStandard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)
Stable:
ICU 60

Reimplemented from icu::Normalizer2.

◆ quickCheck()

virtual UNormalizationCheckResult icu::FilteredNormalizer2::quickCheck ( const UnicodeString s,
UErrorCode errorCode 
) const
virtual

Tests if the string is normalized.

For details see the Normalizer2 base class documentation.

Parameters
sinput string
errorCodeStandard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)
Returns
UNormalizationCheckResult
Stable:
ICU 4.4

Implements icu::Normalizer2.

◆ spanQuickCheckYes()

virtual int32_t icu::FilteredNormalizer2::spanQuickCheckYes ( const UnicodeString s,
UErrorCode errorCode 
) const
virtual

Returns the end of the normalized substring of the input string.

For details see the Normalizer2 base class documentation.

Parameters
sinput string
errorCodeStandard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.)
Returns
"yes" span end index
Stable:
ICU 4.4

Implements icu::Normalizer2.


The documentation for this class was generated from the following file: