ICU 66.0.1  66.0.1
Public Member Functions | Static Public Member Functions | Protected Member Functions
icu::StringSearch Class Referencefinal

StringSearch is a SearchIterator that provides language-sensitive text searching based on the comparison rules defined in a RuleBasedCollator object. More...

#include <stsearch.h>

Inheritance diagram for icu::StringSearch:
icu::SearchIterator icu::UObject icu::UMemory

Public Member Functions

 StringSearch (const UnicodeString &pattern, const UnicodeString &text, const Locale &locale, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument locale language rule set. More...
 
 StringSearch (const UnicodeString &pattern, const UnicodeString &text, RuleBasedCollator *coll, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument collator language rule set. More...
 
 StringSearch (const UnicodeString &pattern, CharacterIterator &text, const Locale &locale, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument locale language rule set. More...
 
 StringSearch (const UnicodeString &pattern, CharacterIterator &text, RuleBasedCollator *coll, BreakIterator *breakiter, UErrorCode &status)
 Creating a StringSearch instance using the argument collator language rule set. More...
 
 StringSearch (const StringSearch &that)
 Copy constructor that creates a StringSearch instance with the same behavior, and iterating over the same text. More...
 
virtual ~StringSearch (void)
 Destructor. More...
 
StringSearchclone () const
 Clone this object. More...
 
StringSearchoperator= (const StringSearch &that)
 Assignment operator. More...
 
virtual UBool operator== (const SearchIterator &that) const
 Equality operator. More...
 
virtual void setOffset (int32_t position, UErrorCode &status)
 Sets the index to point to the given position, and clears any state that's affected. More...
 
virtual int32_t getOffset (void) const
 Return the current index in the text being searched. More...
 
virtual void setText (const UnicodeString &text, UErrorCode &status)
 Set the target text to be searched. More...
 
virtual void setText (CharacterIterator &text, UErrorCode &status)
 Set the target text to be searched. More...
 
RuleBasedCollatorgetCollator () const
 Gets the collator used for the language rules. More...
 
void setCollator (RuleBasedCollator *coll, UErrorCode &status)
 Sets the collator used for the language rules. More...
 
void setPattern (const UnicodeString &pattern, UErrorCode &status)
 Sets the pattern used for matching. More...
 
const UnicodeStringgetPattern () const
 Gets the search pattern. More...
 
virtual void reset ()
 Reset the iteration. More...
 
virtual StringSearchsafeClone () const
 Returns a copy of StringSearch with the same behavior, and iterating over the same text, as this one. More...
 
virtual UClassID getDynamicClassID () const
 ICU "poor man's RTTI", returns a UClassID for the actual class. More...
 
- Public Member Functions inherited from icu::SearchIterator
 SearchIterator (const SearchIterator &other)
 Copy constructor that creates a SearchIterator instance with the same behavior, and iterating over the same text. More...
 
virtual ~SearchIterator ()
 Destructor. More...
 
void setAttribute (USearchAttribute attribute, USearchAttributeValue value, UErrorCode &status)
 Sets the text searching attributes located in the enum USearchAttribute with values from the enum USearchAttributeValue. More...
 
USearchAttributeValue getAttribute (USearchAttribute attribute) const
 Gets the text searching attributes. More...
 
int32_t getMatchedStart (void) const
 Returns the index to the match in the text string that was searched. More...
 
int32_t getMatchedLength (void) const
 Returns the length of text in the string which matches the search pattern. More...
 
void getMatchedText (UnicodeString &result) const
 Returns the text that was matched by the most recent call to first, next, previous, or last. More...
 
void setBreakIterator (BreakIterator *breakiter, UErrorCode &status)
 Set the BreakIterator that will be used to restrict the points at which matches are detected. More...
 
const BreakIteratorgetBreakIterator (void) const
 Returns the BreakIterator that is used to restrict the points at which matches are detected. More...
 
const UnicodeStringgetText (void) const
 Return the string text to be searched. More...
 
UBool operator!= (const SearchIterator &that) const
 Not-equal operator. More...
 
int32_t first (UErrorCode &status)
 Returns the first index at which the string text matches the search pattern. More...
 
int32_t following (int32_t position, UErrorCode &status)
 Returns the first index equal or greater than position at which the string text matches the search pattern. More...
 
int32_t last (UErrorCode &status)
 Returns the last index in the target text at which it matches the search pattern. More...
 
int32_t preceding (int32_t position, UErrorCode &status)
 Returns the first index less than position at which the string text matches the search pattern. More...
 
int32_t next (UErrorCode &status)
 Returns the index of the next point at which the text matches the search pattern, starting from the current position The iterator is adjusted so that its current index (as returned by getOffset) is the match position if one was found. More...
 
int32_t previous (UErrorCode &status)
 Returns the index of the previous point at which the string text matches the search pattern, starting at the current position. More...
 
- Public Member Functions inherited from icu::UObject
virtual ~UObject ()
 Destructor. More...
 

Static Public Member Functions

static UClassID getStaticClassID ()
 ICU "poor man's RTTI", returns a UClassID for this class. More...
 

Protected Member Functions

virtual int32_t handleNext (int32_t position, UErrorCode &status)
 Search forward for matching text, starting at a given location. More...
 
virtual int32_t handlePrev (int32_t position, UErrorCode &status)
 Search backward for matching text, starting at a given location. More...
 
- Protected Member Functions inherited from icu::SearchIterator
 SearchIterator ()
 Default constructor. More...
 
 SearchIterator (const UnicodeString &text, BreakIterator *breakiter=NULL)
 Constructor for use by subclasses. More...
 
 SearchIterator (CharacterIterator &text, BreakIterator *breakiter=NULL)
 Constructor for use by subclasses. More...
 
SearchIteratoroperator= (const SearchIterator &that)
 Assignment operator. More...
 
virtual void setMatchLength (int32_t length)
 Sets the length of the currently matched string in the text string to be searched. More...
 
virtual void setMatchStart (int32_t position)
 Sets the offset of the currently matched string in the text string to be searched. More...
 
void setMatchNotFound ()
 sets match not found More...
 

Additional Inherited Members

- Protected Attributes inherited from icu::SearchIterator
USearchm_search_
 C search data struct. More...
 
BreakIteratorm_breakiterator_
 Break iterator. More...
 
UnicodeString m_text_
 Unicode string version of the search text. More...
 

Detailed Description

StringSearch is a SearchIterator that provides language-sensitive text searching based on the comparison rules defined in a RuleBasedCollator object.

StringSearch ensures that language eccentricity can be handled, e.g. for the German collator, characters ß and SS will be matched if case is chosen to be ignored. See the "ICU Collation Design Document" for more information.

There are 2 match options for selection:
Let S' be the sub-string of a text string S between the offsets start and end [start, end].
A pattern string P matches a text string S at the offsets [start, end] if

 
option 1. Some canonical equivalent of P matches some canonical equivalent
          of S'
option 2. P matches S' and if P starts or ends with a combining mark,
          there exists no non-ignorable combining mark before or after S?
          in S respectively.

Option 2. will be the default.

This search has APIs similar to that of other text iteration mechanisms such as the break iterators in BreakIterator. Using these APIs, it is easy to scan through text looking for all occurrences of a given pattern. This search iterator allows changing of direction by calling a reset followed by a next or previous. Though a direction change can occur without calling reset first, this operation comes with some speed penalty. Match results in the forward direction will match the result matches in the backwards direction in the reverse order

SearchIterator provides APIs to specify the starting position within the text string to be searched, e.g. setOffset, preceding and following. Since the starting position will be set as it is specified, please take note that there are some danger points which the search may render incorrect results:

A BreakIterator can be used if only matches at logical breaks are desired. Using a BreakIterator will only give you results that exactly matches the boundaries given by the breakiterator. For instance the pattern "e" will not be found in the string "\u00e9" if a character break iterator is used.

Options are provided to handle overlapping matches. E.g. In English, overlapping matches produces the result 0 and 2 for the pattern "abab" in the text "ababab", where else mutually exclusive matches only produce the result of 0.

Though collator attributes will be taken into consideration while performing matches, there are no APIs here for setting and getting the attributes. These attributes can be set by getting the collator from getCollator and using the APIs in coll.h. Lastly to update StringSearch to the new collator attributes, reset has to be called.

Restriction:
Currently there are no composite characters that consists of a character with combining class > 0 before a character with combining class == 0. However, if such a character exists in the future, StringSearch does not guarantee the results for option 1.

Consult the SearchIterator documentation for information on and examples of how to use instances of this class to implement text searching.


UnicodeString target("The quick brown fox jumps over the lazy dog.");
UnicodeString pattern("fox");
UErrorCode      error = U_ZERO_ERROR;
StringSearch iter(pattern, target, Locale::getUS(), NULL, status);
for (int pos = iter.first(error);
     pos != USEARCH_DONE; 
     pos = iter.next(error))
{
    printf("Found match at %d pos, length is %d\n", pos, iter.getMatchedLength());
}

Note, StringSearch is not to be subclassed.

See also
SearchIterator
RuleBasedCollator
Since
ICU 2.0

Definition at line 135 of file stsearch.h.

Constructor & Destructor Documentation

◆ StringSearch() [1/5]

icu::StringSearch::StringSearch ( const UnicodeString pattern,
const UnicodeString text,
const Locale locale,
BreakIterator breakiter,
UErrorCode status 
)

Creating a StringSearch instance using the argument locale language rule set.

A collator will be created in the process, which will be owned by this instance and will be deleted during destruction

Parameters
patternThe text for which this object will search.
textThe text in which to search for the pattern.
localeA locale which defines the language-sensitive comparison rules used to determine whether text in the pattern and target matches.
breakiterA BreakIterator object used to constrain the matches that are found. Matches whose start and end indices in the target text are not boundaries as determined by the BreakIterator are ignored. If this behavior is not desired, NULL can be passed in instead.
statusfor errors if any. If pattern or text is NULL, or if either the length of pattern or text is 0 then an U_ILLEGAL_ARGUMENT_ERROR is returned.
Stable:
ICU 2.0

◆ StringSearch() [2/5]

icu::StringSearch::StringSearch ( const UnicodeString pattern,
const UnicodeString text,
RuleBasedCollator coll,
BreakIterator breakiter,
UErrorCode status 
)

Creating a StringSearch instance using the argument collator language rule set.

Note, user retains the ownership of this collator, it does not get destroyed during this instance's destruction.

Parameters
patternThe text for which this object will search.
textThe text in which to search for the pattern.
collA RuleBasedCollator object which defines the language-sensitive comparison rules used to determine whether text in the pattern and target matches. User is responsible for the clearing of this object.
breakiterA BreakIterator object used to constrain the matches that are found. Matches whose start and end indices in the target text are not boundaries as determined by the BreakIterator are ignored. If this behavior is not desired, NULL can be passed in instead.
statusfor errors if any. If either the length of pattern or text is 0 then an U_ILLEGAL_ARGUMENT_ERROR is returned.
Stable:
ICU 2.0

◆ StringSearch() [3/5]

icu::StringSearch::StringSearch ( const UnicodeString pattern,
CharacterIterator text,
const Locale locale,
BreakIterator breakiter,
UErrorCode status 
)

Creating a StringSearch instance using the argument locale language rule set.

A collator will be created in the process, which will be owned by this instance and will be deleted during destruction

Note: No parsing of the text within the CharacterIterator will be done during searching for this version. The block of text in CharacterIterator will be used as it is.

Parameters
patternThe text for which this object will search.
textThe text iterator in which to search for the pattern.
localeA locale which defines the language-sensitive comparison rules used to determine whether text in the pattern and target matches. User is responsible for the clearing of this object.
breakiterA BreakIterator object used to constrain the matches that are found. Matches whose start and end indices in the target text are not boundaries as determined by the BreakIterator are ignored. If this behavior is not desired, NULL can be passed in instead.
statusfor errors if any. If either the length of pattern or text is 0 then an U_ILLEGAL_ARGUMENT_ERROR is returned.
Stable:
ICU 2.0

◆ StringSearch() [4/5]

icu::StringSearch::StringSearch ( const UnicodeString pattern,
CharacterIterator text,
RuleBasedCollator coll,
BreakIterator breakiter,
UErrorCode status 
)

Creating a StringSearch instance using the argument collator language rule set.

Note, user retains the ownership of this collator, it does not get destroyed during this instance's destruction.

Note: No parsing of the text within the CharacterIterator will be done during searching for this version. The block of text in CharacterIterator will be used as it is.

Parameters
patternThe text for which this object will search.
textThe text in which to search for the pattern.
collA RuleBasedCollator object which defines the language-sensitive comparison rules used to determine whether text in the pattern and target matches. User is responsible for the clearing of this object.
breakiterA BreakIterator object used to constrain the matches that are found. Matches whose start and end indices in the target text are not boundaries as determined by the BreakIterator are ignored. If this behavior is not desired, NULL can be passed in instead.
statusfor errors if any. If either the length of pattern or text is 0 then an U_ILLEGAL_ARGUMENT_ERROR is returned.
Stable:
ICU 2.0

◆ StringSearch() [5/5]

icu::StringSearch::StringSearch ( const StringSearch that)

Copy constructor that creates a StringSearch instance with the same behavior, and iterating over the same text.

Parameters
thatStringSearch instance to be copied.
Stable:
ICU 2.0

◆ ~StringSearch()

virtual icu::StringSearch::~StringSearch ( void  )
virtual

Destructor.

Cleans up the search iterator data struct. If a collator is created in the constructor, it will be destroyed here.

Stable:
ICU 2.0

Member Function Documentation

◆ clone()

StringSearch* icu::StringSearch::clone ( ) const

Clone this object.

Clones can be used concurrently in multiple threads. If an error occurs, then NULL is returned. The caller must delete the clone.

Returns
a clone of this object
See also
getDynamicClassID
Stable:
ICU 2.8

◆ getCollator()

RuleBasedCollator* icu::StringSearch::getCollator ( ) const

Gets the collator used for the language rules.

Caller may modify but must not delete the RuleBasedCollator! Modifications to this collator will affect the original collator passed in to the StringSearch> constructor or to setCollator, if any.

Returns
collator used for string search
Stable:
ICU 2.0

◆ getDynamicClassID()

virtual UClassID icu::StringSearch::getDynamicClassID ( ) const
virtual

ICU "poor man's RTTI", returns a UClassID for the actual class.

Stable:
ICU 2.2

Reimplemented from icu::UObject.

◆ getOffset()

virtual int32_t icu::StringSearch::getOffset ( void  ) const
virtual

Return the current index in the text being searched.

If the iteration has gone past the end of the text (or past the beginning for a backwards search), USEARCH_DONE is returned.

Returns
current index in the text being searched.
Stable:
ICU 2.0

Implements icu::SearchIterator.

◆ getPattern()

const UnicodeString& icu::StringSearch::getPattern ( ) const

Gets the search pattern.

Returns
pattern used for matching
Stable:
ICU 2.0

◆ getStaticClassID()

static UClassID icu::StringSearch::getStaticClassID ( )
static

ICU "poor man's RTTI", returns a UClassID for this class.

Stable:
ICU 2.2

◆ handleNext()

virtual int32_t icu::StringSearch::handleNext ( int32_t  position,
UErrorCode status 
)
protectedvirtual

Search forward for matching text, starting at a given location.

Clients should not call this method directly; instead they should call SearchIterator#next.

If a match is found, this method returns the index at which the match starts and calls SearchIterator#setMatchLength with the number of characters in the target text that make up the match. If no match is found, the method returns USEARCH_DONE.

The StringSearch is adjusted so that its current index (as returned by getOffset) is the match position if one was found. If a match is not found, USEARCH_DONE will be returned and the StringSearch will be adjusted to the index USEARCH_DONE.

Parameters
positionThe index in the target text at which the search starts
statusfor errors if any occurs
Returns
The index at which the matched text in the target starts, or USEARCH_DONE if no match was found.
Stable:
ICU 2.0

Implements icu::SearchIterator.

◆ handlePrev()

virtual int32_t icu::StringSearch::handlePrev ( int32_t  position,
UErrorCode status 
)
protectedvirtual

Search backward for matching text, starting at a given location.

Clients should not call this method directly; instead they should call SearchIterator.previous(), which this method overrides.

If a match is found, this method returns the index at which the match starts and calls SearchIterator#setMatchLength with the number of characters in the target text that make up the match. If no match is found, the method returns USEARCH_DONE.

The StringSearch is adjusted so that its current index (as returned by getOffset) is the match position if one was found. If a match is not found, USEARCH_DONE will be returned and the StringSearch will be adjusted to the index USEARCH_DONE.

Parameters
positionThe index in the target text at which the search starts.
statusfor errors if any occurs
Returns
The index at which the matched text in the target starts, or USEARCH_DONE if no match was found.
Stable:
ICU 2.0

Implements icu::SearchIterator.

◆ operator=()

StringSearch& icu::StringSearch::operator= ( const StringSearch that)

Assignment operator.

Sets this iterator to have the same behavior, and iterate over the same text, as the one passed in.

Parameters
thatinstance to be copied.
Stable:
ICU 2.0

◆ operator==()

virtual UBool icu::StringSearch::operator== ( const SearchIterator that) const
virtual

Equality operator.

Parameters
thatinstance to be compared.
Returns
TRUE if both instances have the same attributes, breakiterators, collators and iterate over the same text while looking for the same pattern.
Stable:
ICU 2.0

Reimplemented from icu::SearchIterator.

◆ reset()

virtual void icu::StringSearch::reset ( )
virtual

Reset the iteration.

Search will begin at the start of the text string if a forward iteration is initiated before a backwards iteration. Otherwise if a backwards iteration is initiated before a forwards iteration, the search will begin at the end of the text string.

Stable:
ICU 2.0

Reimplemented from icu::SearchIterator.

◆ safeClone()

virtual StringSearch* icu::StringSearch::safeClone ( ) const
virtual

Returns a copy of StringSearch with the same behavior, and iterating over the same text, as this one.

Note that all data will be replicated, except for the user-specified collator and the breakiterator.

Returns
cloned object
Stable:
ICU 2.0

Implements icu::SearchIterator.

◆ setCollator()

void icu::StringSearch::setCollator ( RuleBasedCollator coll,
UErrorCode status 
)

Sets the collator used for the language rules.

User retains the ownership of this collator, thus the responsibility of deletion lies with the user. The iterator's position will not be changed by this method.

Parameters
collcollator
statusfor errors if any
Stable:
ICU 2.0

◆ setOffset()

virtual void icu::StringSearch::setOffset ( int32_t  position,
UErrorCode status 
)
virtual

Sets the index to point to the given position, and clears any state that's affected.

This method takes the argument index and sets the position in the text string accordingly without checking if the index is pointing to a valid starting point to begin searching.

Parameters
positionwithin the text to be set. If position is less than or greater than the text range for searching, an U_INDEX_OUTOFBOUNDS_ERROR will be returned
statusfor errors if it occurs
Stable:
ICU 2.0

Implements icu::SearchIterator.

◆ setPattern()

void icu::StringSearch::setPattern ( const UnicodeString pattern,
UErrorCode status 
)

Sets the pattern used for matching.

The iterator's position will not be changed by this method.

Parameters
patternsearch pattern to be found
statusfor errors if any. If the pattern length is 0 then an U_ILLEGAL_ARGUMENT_ERROR is returned.
Stable:
ICU 2.0

◆ setText() [1/2]

virtual void icu::StringSearch::setText ( const UnicodeString text,
UErrorCode status 
)
virtual

Set the target text to be searched.

Text iteration will hence begin at the start of the text string. This method is useful if you want to re-use an iterator to search for the same pattern within a different body of text.

Parameters
texttext string to be searched
statusfor errors if any. If the text length is 0 then an U_ILLEGAL_ARGUMENT_ERROR is returned.
Stable:
ICU 2.0

Reimplemented from icu::SearchIterator.

◆ setText() [2/2]

virtual void icu::StringSearch::setText ( CharacterIterator text,
UErrorCode status 
)
virtual

Set the target text to be searched.

Text iteration will hence begin at the start of the text string. This method is useful if you want to re-use an iterator to search for the same pattern within a different body of text. Note: No parsing of the text within the CharacterIterator will be done during searching for this version. The block of text in CharacterIterator will be used as it is.

Parameters
texttext string to be searched
statusfor errors if any. If the text length is 0 then an U_ILLEGAL_ARGUMENT_ERROR is returned.
Stable:
ICU 2.0

Reimplemented from icu::SearchIterator.


The documentation for this class was generated from the following file: