ICU 65.1  65.1
Macros
utf16.h File Reference

C API: 16-bit Unicode handling macros. More...

#include "unicode/umachine.h"
#include "unicode/utf.h"

Go to the source code of this file.

Macros

#define U16_IS_SINGLE(c)   !U_IS_SURROGATE(c)
 Does this code unit alone encode a code point (BMP, not a surrogate)? More...
 
#define U16_IS_LEAD(c)   (((c)&0xfffffc00)==0xd800)
 Is this code unit a lead surrogate (U+d800..U+dbff)? More...
 
#define U16_IS_TRAIL(c)   (((c)&0xfffffc00)==0xdc00)
 Is this code unit a trail surrogate (U+dc00..U+dfff)? More...
 
#define U16_IS_SURROGATE(c)   U_IS_SURROGATE(c)
 Is this code unit a surrogate (U+d800..U+dfff)? More...
 
#define U16_IS_SURROGATE_LEAD(c)   (((c)&0x400)==0)
 Assuming c is a surrogate code point (U16_IS_SURROGATE(c)), is it a lead surrogate? More...
 
#define U16_IS_SURROGATE_TRAIL(c)   (((c)&0x400)!=0)
 Assuming c is a surrogate code point (U16_IS_SURROGATE(c)), is it a trail surrogate? More...
 
#define U16_SURROGATE_OFFSET   ((0xd800<<10UL)+0xdc00-0x10000)
 Helper constant for U16_GET_SUPPLEMENTARY. More...
 
#define U16_GET_SUPPLEMENTARY(lead, trail)   (((UChar32)(lead)<<10UL)+(UChar32)(trail)-U16_SURROGATE_OFFSET)
 Get a supplementary code point value (U+10000..U+10ffff) from its lead and trail surrogates. More...
 
#define U16_LEAD(supplementary)   (UChar)(((supplementary)>>10)+0xd7c0)
 Get the lead surrogate (0xd800..0xdbff) for a supplementary code point (0x10000..0x10ffff). More...
 
#define U16_TRAIL(supplementary)   (UChar)(((supplementary)&0x3ff)|0xdc00)
 Get the trail surrogate (0xdc00..0xdfff) for a supplementary code point (0x10000..0x10ffff). More...
 
#define U16_LENGTH(c)   ((uint32_t)(c)<=0xffff ? 1 : 2)
 How many 16-bit code units are used to encode this Unicode code point? (1 or 2) The result is not defined if c is not a Unicode code point (U+0000..U+10ffff). More...
 
#define U16_MAX_LENGTH   2
 The maximum number of 16-bit code units per Unicode code point (U+0000..U+10ffff). More...
 
#define U16_GET_UNSAFE(s, i, c)
 Get a code point from a string at a random-access offset, without changing the offset. More...
 
#define U16_GET(s, start, i, length, c)
 Get a code point from a string at a random-access offset, without changing the offset. More...
 
#define U16_GET_OR_FFFD(s, start, i, length, c)
 Get a code point from a string at a random-access offset, without changing the offset. More...
 
#define U16_NEXT_UNSAFE(s, i, c)
 Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary. More...
 
#define U16_NEXT(s, i, length, c)
 Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary. More...
 
#define U16_NEXT_OR_FFFD(s, i, length, c)
 Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary. More...
 
#define U16_APPEND_UNSAFE(s, i, c)
 Append a code point to a string, overwriting 1 or 2 code units. More...
 
#define U16_APPEND(s, i, capacity, c, isError)
 Append a code point to a string, overwriting 1 or 2 code units. More...
 
#define U16_FWD_1_UNSAFE(s, i)
 Advance the string offset from one code point boundary to the next. More...
 
#define U16_FWD_1(s, i, length)
 Advance the string offset from one code point boundary to the next. More...
 
#define U16_FWD_N_UNSAFE(s, i, n)
 Advance the string offset from one code point boundary to the n-th next one, i.e., move forward by n code points. More...
 
#define U16_FWD_N(s, i, length, n)
 Advance the string offset from one code point boundary to the n-th next one, i.e., move forward by n code points. More...
 
#define U16_SET_CP_START_UNSAFE(s, i)
 Adjust a random-access offset to a code point boundary at the start of a code point. More...
 
#define U16_SET_CP_START(s, start, i)
 Adjust a random-access offset to a code point boundary at the start of a code point. More...
 
#define U16_PREV_UNSAFE(s, i, c)
 Move the string offset from one code point boundary to the previous one and get the code point between them. More...
 
#define U16_PREV(s, start, i, c)
 Move the string offset from one code point boundary to the previous one and get the code point between them. More...
 
#define U16_PREV_OR_FFFD(s, start, i, c)
 Move the string offset from one code point boundary to the previous one and get the code point between them. More...
 
#define U16_BACK_1_UNSAFE(s, i)
 Move the string offset from one code point boundary to the previous one. More...
 
#define U16_BACK_1(s, start, i)
 Move the string offset from one code point boundary to the previous one. More...
 
#define U16_BACK_N_UNSAFE(s, i, n)
 Move the string offset from one code point boundary to the n-th one before it, i.e., move backward by n code points. More...
 
#define U16_BACK_N(s, start, i, n)
 Move the string offset from one code point boundary to the n-th one before it, i.e., move backward by n code points. More...
 
#define U16_SET_CP_LIMIT_UNSAFE(s, i)
 Adjust a random-access offset to a code point boundary after a code point. More...
 
#define U16_SET_CP_LIMIT(s, start, i, length)
 Adjust a random-access offset to a code point boundary after a code point. More...
 

Detailed Description

C API: 16-bit Unicode handling macros.

This file defines macros to deal with 16-bit Unicode (UTF-16) code units and strings.

For more information see utf.h and the ICU User Guide Strings chapter (http://userguide.icu-project.org/strings).

Usage: ICU coding guidelines for if() statements should be followed when using these macros. Compound statements (curly braces {}) must be used for if-else-while... bodies and all macro statements should be terminated with semicolon.

Definition in file utf16.h.

Macro Definition Documentation

◆ U16_APPEND

#define U16_APPEND (   s,
  i,
  capacity,
  c,
  isError 
)
Value:
if((uint32_t)(c)<=0xffff) { \
(s)[(i)++]=(uint16_t)(c); \
} else if((uint32_t)(c)<=0x10ffff && (i)+1<(capacity)) { \
(s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \
(s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \
} else /* c>0x10ffff or not enough space */ { \
(isError)=TRUE; \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define TRUE
The TRUE value of a UBool.
Definition: umachine.h:265
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Append a code point to a string, overwriting 1 or 2 code units.

The offset points to the current end of the string contents and is advanced (post-increment). "Safe" macro, checks for a valid code point. If a surrogate pair is written, checks for sufficient space in the string. If the code point is not valid or a trail surrogate does not fit, then isError is set to TRUE.

Parameters
sconst UChar * string buffer
istring offset, must be i<capacity
capacitysize of the string buffer
ccode point to append
isErroroutput UBool set to TRUE if an error occurs, otherwise not modified
See also
U16_APPEND_UNSAFE
Stable:
ICU 2.4

Definition at line 392 of file utf16.h.

◆ U16_APPEND_UNSAFE

#define U16_APPEND_UNSAFE (   s,
  i,
 
)
Value:
if((uint32_t)(c)<=0xffff) { \
(s)[(i)++]=(uint16_t)(c); \
} else { \
(s)[(i)++]=(uint16_t)(((c)>>10)+0xd7c0); \
(s)[(i)++]=(uint16_t)(((c)&0x3ff)|0xdc00); \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Append a code point to a string, overwriting 1 or 2 code units.

The offset points to the current end of the string contents and is advanced (post-increment). "Unsafe" macro, assumes a valid code point and sufficient space in the string. Otherwise, the result is undefined.

Parameters
sconst UChar * string buffer
istring offset
ccode point to append
See also
U16_APPEND
Stable:
ICU 2.4

Definition at line 366 of file utf16.h.

◆ U16_BACK_1

#define U16_BACK_1 (   s,
  start,
 
)
Value:
if(U16_IS_TRAIL((s)[--(i)]) && (i)>(start) && U16_IS_LEAD((s)[(i)-1])) { \
--(i); \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_LEAD(c)
Is this code unit a lead surrogate (U+d800..U+dbff)?
Definition: utf16.h:58
#define U16_IS_TRAIL(c)
Is this code unit a trail surrogate (U+dc00..U+dfff)?
Definition: utf16.h:66
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Move the string offset from one code point boundary to the previous one.

(Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Safe" macro, handles unpaired surrogates and checks for string boundaries.

Parameters
sconst UChar * string
startstarting string offset (usually 0)
istring offset, must be start<i
See also
U16_BACK_1_UNSAFE
Stable:
ICU 2.4

Definition at line 642 of file utf16.h.

◆ U16_BACK_1_UNSAFE

#define U16_BACK_1_UNSAFE (   s,
 
)
Value:
if(U16_IS_TRAIL((s)[--(i)])) { \
--(i); \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_TRAIL(c)
Is this code unit a trail surrogate (U+dc00..U+dfff)?
Definition: utf16.h:66
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Move the string offset from one code point boundary to the previous one.

(Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Unsafe" macro, assumes well-formed UTF-16.

Parameters
sconst UChar * string
istring offset
See also
U16_BACK_1
Stable:
ICU 2.4

Definition at line 624 of file utf16.h.

◆ U16_BACK_N

#define U16_BACK_N (   s,
  start,
  i,
 
)
Value:
int32_t __N=(n); \
while(__N>0 && (i)>(start)) { \
U16_BACK_1(s, start, i); \
--__N; \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Move the string offset from one code point boundary to the n-th one before it, i.e., move backward by n code points.

(Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Safe" macro, handles unpaired surrogates and checks for string boundaries.

Parameters
sconst UChar * string
startstart of string
istring offset, must be start<i
nnumber of code points to skip
See also
U16_BACK_N_UNSAFE
Stable:
ICU 2.4

Definition at line 683 of file utf16.h.

◆ U16_BACK_N_UNSAFE

#define U16_BACK_N_UNSAFE (   s,
  i,
 
)
Value:
int32_t __N=(n); \
while(__N>0) { \
U16_BACK_1_UNSAFE(s, i); \
--__N; \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Move the string offset from one code point boundary to the n-th one before it, i.e., move backward by n code points.

(Pre-decrementing backward iteration.) The input offset may be the same as the string length. "Unsafe" macro, assumes well-formed UTF-16.

Parameters
sconst UChar * string
istring offset
nnumber of code points to skip
See also
U16_BACK_N
Stable:
ICU 2.4

Definition at line 661 of file utf16.h.

◆ U16_FWD_1

#define U16_FWD_1 (   s,
  i,
  length 
)
Value:
if(U16_IS_LEAD((s)[(i)++]) && (i)!=(length) && U16_IS_TRAIL((s)[i])) { \
++(i); \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_LEAD(c)
Is this code unit a lead surrogate (U+d800..U+dbff)?
Definition: utf16.h:58
#define U16_IS_TRAIL(c)
Is this code unit a trail surrogate (U+dc00..U+dfff)?
Definition: utf16.h:66
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Advance the string offset from one code point boundary to the next.

(Post-incrementing iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.

The length can be negative for a NUL-terminated string.

Parameters
sconst UChar * string
istring offset, must be i<length
lengthstring length
See also
U16_FWD_1_UNSAFE
Stable:
ICU 2.4

Definition at line 432 of file utf16.h.

◆ U16_FWD_1_UNSAFE

#define U16_FWD_1_UNSAFE (   s,
 
)
Value:
if(U16_IS_LEAD((s)[(i)++])) { \
++(i); \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_LEAD(c)
Is this code unit a lead surrogate (U+d800..U+dbff)?
Definition: utf16.h:58
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Advance the string offset from one code point boundary to the next.

(Post-incrementing iteration.) "Unsafe" macro, assumes well-formed UTF-16.

Parameters
sconst UChar * string
istring offset
See also
U16_FWD_1
Stable:
ICU 2.4

Definition at line 413 of file utf16.h.

◆ U16_FWD_N

#define U16_FWD_N (   s,
  i,
  length,
 
)
Value:
int32_t __N=(n); \
while(__N>0 && ((i)<(length) || ((length)<0 && (s)[i]!=0))) { \
U16_FWD_1(s, i, length); \
--__N; \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Advance the string offset from one code point boundary to the n-th next one, i.e., move forward by n code points.

(Post-incrementing iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.

The length can be negative for a NUL-terminated string.

Parameters
sconst UChar * string
iint32_t string offset, must be i<length
lengthint32_t string length
nnumber of code points to skip
See also
U16_FWD_N_UNSAFE
Stable:
ICU 2.4

Definition at line 473 of file utf16.h.

◆ U16_FWD_N_UNSAFE

#define U16_FWD_N_UNSAFE (   s,
  i,
 
)
Value:
int32_t __N=(n); \
while(__N>0) { \
U16_FWD_1_UNSAFE(s, i); \
--__N; \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Advance the string offset from one code point boundary to the n-th next one, i.e., move forward by n code points.

(Post-incrementing iteration.) "Unsafe" macro, assumes well-formed UTF-16.

Parameters
sconst UChar * string
istring offset
nnumber of code points to skip
See also
U16_FWD_N
Stable:
ICU 2.4

Definition at line 450 of file utf16.h.

◆ U16_GET

#define U16_GET (   s,
  start,
  i,
  length,
 
)
Value:
(c)=(s)[i]; \
if(U16_IS_SURROGATE(c)) { \
uint16_t __c2; \
if((i)+1!=(length) && U16_IS_TRAIL(__c2=(s)[(i)+1])) { \
(c)=U16_GET_SUPPLEMENTARY((c), __c2); \
} \
} else { \
if((i)>(start) && U16_IS_LEAD(__c2=(s)[(i)-1])) { \
(c)=U16_GET_SUPPLEMENTARY(__c2, (c)); \
} \
} \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_SURROGATE_LEAD(c)
Assuming c is a surrogate code point (U16_IS_SURROGATE(c)), is it a lead surrogate?
Definition: utf16.h:83
#define U16_IS_LEAD(c)
Is this code unit a lead surrogate (U+d800..U+dbff)?
Definition: utf16.h:58
#define U16_IS_TRAIL(c)
Is this code unit a trail surrogate (U+dc00..U+dfff)?
Definition: utf16.h:66
#define U16_GET_SUPPLEMENTARY(lead, trail)
Get a supplementary code point value (U+10000..U+10ffff) from its lead and trail surrogates.
Definition: utf16.h:111
#define U16_IS_SURROGATE(c)
Is this code unit a surrogate (U+d800..U+dfff)?
Definition: utf16.h:74
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Get a code point from a string at a random-access offset, without changing the offset.

"Safe" macro, handles unpaired surrogates and checks for string boundaries.

The offset may point to either the lead or trail surrogate unit for a supplementary code point, in which case the macro will read the adjacent matching surrogate as well.

The length can be negative for a NUL-terminated string.

If the offset points to a single, unpaired surrogate, then c is set to that unpaired surrogate. Iteration through a string is more efficient with U16_NEXT_UNSAFE or U16_NEXT.

Parameters
sconst UChar * string
startstarting string offset (usually 0)
istring offset, must be start<=i<length
lengthstring length
coutput UChar32 variable
See also
U16_GET_UNSAFE
Stable:
ICU 2.4

Definition at line 200 of file utf16.h.

◆ U16_GET_OR_FFFD

#define U16_GET_OR_FFFD (   s,
  start,
  i,
  length,
 
)
Value:
(c)=(s)[i]; \
if(U16_IS_SURROGATE(c)) { \
uint16_t __c2; \
if((i)+1!=(length) && U16_IS_TRAIL(__c2=(s)[(i)+1])) { \
(c)=U16_GET_SUPPLEMENTARY((c), __c2); \
} else { \
(c)=0xfffd; \
} \
} else { \
if((i)>(start) && U16_IS_LEAD(__c2=(s)[(i)-1])) { \
(c)=U16_GET_SUPPLEMENTARY(__c2, (c)); \
} else { \
(c)=0xfffd; \
} \
} \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_SURROGATE_LEAD(c)
Assuming c is a surrogate code point (U16_IS_SURROGATE(c)), is it a lead surrogate?
Definition: utf16.h:83
#define U16_IS_LEAD(c)
Is this code unit a lead surrogate (U+d800..U+dbff)?
Definition: utf16.h:58
#define U16_IS_TRAIL(c)
Is this code unit a trail surrogate (U+dc00..U+dfff)?
Definition: utf16.h:66
#define U16_GET_SUPPLEMENTARY(lead, trail)
Get a supplementary code point value (U+10000..U+10ffff) from its lead and trail surrogates.
Definition: utf16.h:111
#define U16_IS_SURROGATE(c)
Is this code unit a surrogate (U+d800..U+dfff)?
Definition: utf16.h:74
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Get a code point from a string at a random-access offset, without changing the offset.

"Safe" macro, handles unpaired surrogates and checks for string boundaries.

The offset may point to either the lead or trail surrogate unit for a supplementary code point, in which case the macro will read the adjacent matching surrogate as well.

The length can be negative for a NUL-terminated string.

If the offset points to a single, unpaired surrogate, then c is set to U+FFFD. Iteration through a string is more efficient with U16_NEXT_UNSAFE or U16_NEXT_OR_FFFD.

Parameters
sconst UChar * string
startstarting string offset (usually 0)
istring offset, must be start<=i<length
lengthstring length
coutput UChar32 variable
See also
U16_GET_UNSAFE
Stable:
ICU 60

Definition at line 239 of file utf16.h.

◆ U16_GET_SUPPLEMENTARY

#define U16_GET_SUPPLEMENTARY (   lead,
  trail 
)    (((UChar32)(lead)<<10UL)+(UChar32)(trail)-U16_SURROGATE_OFFSET)

Get a supplementary code point value (U+10000..U+10ffff) from its lead and trail surrogates.

The result is undefined if the input values are not lead and trail surrogates.

Parameters
leadlead surrogate (U+d800..U+dbff)
trailtrail surrogate (U+dc00..U+dfff)
Returns
supplementary code point (U+10000..U+10ffff)
Stable:
ICU 2.4

Definition at line 111 of file utf16.h.

◆ U16_GET_UNSAFE

#define U16_GET_UNSAFE (   s,
  i,
 
)
Value:
(c)=(s)[i]; \
if(U16_IS_SURROGATE(c)) { \
(c)=U16_GET_SUPPLEMENTARY((c), (s)[(i)+1]); \
} else { \
(c)=U16_GET_SUPPLEMENTARY((s)[(i)-1], (c)); \
} \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_SURROGATE_LEAD(c)
Assuming c is a surrogate code point (U16_IS_SURROGATE(c)), is it a lead surrogate?
Definition: utf16.h:83
#define U16_GET_SUPPLEMENTARY(lead, trail)
Get a supplementary code point value (U+10000..U+10ffff) from its lead and trail surrogates.
Definition: utf16.h:111
#define U16_IS_SURROGATE(c)
Is this code unit a surrogate (U+d800..U+dfff)?
Definition: utf16.h:74
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Get a code point from a string at a random-access offset, without changing the offset.

"Unsafe" macro, assumes well-formed UTF-16.

The offset may point to either the lead or trail surrogate unit for a supplementary code point, in which case the macro will read the adjacent matching surrogate as well. The result is undefined if the offset points to a single, unpaired surrogate. Iteration through a string is more efficient with U16_NEXT_UNSAFE or U16_NEXT.

Parameters
sconst UChar * string
istring offset
coutput UChar32 variable
See also
U16_GET
Stable:
ICU 2.4

Definition at line 166 of file utf16.h.

◆ U16_IS_LEAD

#define U16_IS_LEAD (   c)    (((c)&0xfffffc00)==0xd800)

Is this code unit a lead surrogate (U+d800..U+dbff)?

Parameters
c16-bit code unit
Returns
TRUE or FALSE
Stable:
ICU 2.4

Definition at line 58 of file utf16.h.

◆ U16_IS_SINGLE

#define U16_IS_SINGLE (   c)    !U_IS_SURROGATE(c)

Does this code unit alone encode a code point (BMP, not a surrogate)?

Parameters
c16-bit code unit
Returns
TRUE or FALSE
Stable:
ICU 2.4

Definition at line 50 of file utf16.h.

◆ U16_IS_SURROGATE

#define U16_IS_SURROGATE (   c)    U_IS_SURROGATE(c)

Is this code unit a surrogate (U+d800..U+dfff)?

Parameters
c16-bit code unit
Returns
TRUE or FALSE
Stable:
ICU 2.4

Definition at line 74 of file utf16.h.

◆ U16_IS_SURROGATE_LEAD

#define U16_IS_SURROGATE_LEAD (   c)    (((c)&0x400)==0)

Assuming c is a surrogate code point (U16_IS_SURROGATE(c)), is it a lead surrogate?

Parameters
c16-bit code unit
Returns
TRUE or FALSE
Stable:
ICU 2.4

Definition at line 83 of file utf16.h.

◆ U16_IS_SURROGATE_TRAIL

#define U16_IS_SURROGATE_TRAIL (   c)    (((c)&0x400)!=0)

Assuming c is a surrogate code point (U16_IS_SURROGATE(c)), is it a trail surrogate?

Parameters
c16-bit code unit
Returns
TRUE or FALSE
Stable:
ICU 4.2

Definition at line 92 of file utf16.h.

◆ U16_IS_TRAIL

#define U16_IS_TRAIL (   c)    (((c)&0xfffffc00)==0xdc00)

Is this code unit a trail surrogate (U+dc00..U+dfff)?

Parameters
c16-bit code unit
Returns
TRUE or FALSE
Stable:
ICU 2.4

Definition at line 66 of file utf16.h.

◆ U16_LEAD

#define U16_LEAD (   supplementary)    (UChar)(((supplementary)>>10)+0xd7c0)

Get the lead surrogate (0xd800..0xdbff) for a supplementary code point (0x10000..0x10ffff).

Parameters
supplementary32-bit code point (U+10000..U+10ffff)
Returns
lead surrogate (U+d800..U+dbff) for supplementary
Stable:
ICU 2.4

Definition at line 122 of file utf16.h.

◆ U16_LENGTH

#define U16_LENGTH (   c)    ((uint32_t)(c)<=0xffff ? 1 : 2)

How many 16-bit code units are used to encode this Unicode code point? (1 or 2) The result is not defined if c is not a Unicode code point (U+0000..U+10ffff).

Parameters
c32-bit code point
Returns
1 or 2
Stable:
ICU 2.4

Definition at line 140 of file utf16.h.

◆ U16_MAX_LENGTH

#define U16_MAX_LENGTH   2

The maximum number of 16-bit code units per Unicode code point (U+0000..U+10ffff).

Returns
2
Stable:
ICU 2.4

Definition at line 147 of file utf16.h.

◆ U16_NEXT

#define U16_NEXT (   s,
  i,
  length,
 
)
Value:
(c)=(s)[(i)++]; \
if(U16_IS_LEAD(c)) { \
uint16_t __c2; \
if((i)!=(length) && U16_IS_TRAIL(__c2=(s)[(i)])) { \
++(i); \
(c)=U16_GET_SUPPLEMENTARY((c), __c2); \
} \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_LEAD(c)
Is this code unit a lead surrogate (U+d800..U+dbff)?
Definition: utf16.h:58
#define U16_IS_TRAIL(c)
Is this code unit a trail surrogate (U+dc00..U+dfff)?
Definition: utf16.h:66
#define U16_GET_SUPPLEMENTARY(lead, trail)
Get a supplementary code point value (U+10000..U+10ffff) from its lead and trail surrogates.
Definition: utf16.h:111
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary.

(Post-incrementing forward iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.

The length can be negative for a NUL-terminated string.

The offset may point to the lead surrogate unit for a supplementary code point, in which case the macro will read the following trail surrogate as well. If the offset points to a trail surrogate or to a single, unpaired lead surrogate, then c is set to that unpaired surrogate.

Parameters
sconst UChar * string
istring offset, must be i<length
lengthstring length
coutput UChar32 variable
See also
U16_NEXT_UNSAFE
Stable:
ICU 2.4

Definition at line 308 of file utf16.h.

◆ U16_NEXT_OR_FFFD

#define U16_NEXT_OR_FFFD (   s,
  i,
  length,
 
)
Value:
(c)=(s)[(i)++]; \
if(U16_IS_SURROGATE(c)) { \
uint16_t __c2; \
if(U16_IS_SURROGATE_LEAD(c) && (i)!=(length) && U16_IS_TRAIL(__c2=(s)[(i)])) { \
++(i); \
(c)=U16_GET_SUPPLEMENTARY((c), __c2); \
} else { \
(c)=0xfffd; \
} \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_SURROGATE_LEAD(c)
Assuming c is a surrogate code point (U16_IS_SURROGATE(c)), is it a lead surrogate?
Definition: utf16.h:83
#define U16_IS_TRAIL(c)
Is this code unit a trail surrogate (U+dc00..U+dfff)?
Definition: utf16.h:66
#define U16_GET_SUPPLEMENTARY(lead, trail)
Get a supplementary code point value (U+10000..U+10ffff) from its lead and trail surrogates.
Definition: utf16.h:111
#define U16_IS_SURROGATE(c)
Is this code unit a surrogate (U+d800..U+dfff)?
Definition: utf16.h:74
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary.

(Post-incrementing forward iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.

The length can be negative for a NUL-terminated string.

The offset may point to the lead surrogate unit for a supplementary code point, in which case the macro will read the following trail surrogate as well. If the offset points to a trail surrogate or to a single, unpaired lead surrogate, then c is set to U+FFFD.

Parameters
sconst UChar * string
istring offset, must be i<length
lengthstring length
coutput UChar32 variable
See also
U16_NEXT_UNSAFE
Stable:
ICU 60

Definition at line 340 of file utf16.h.

◆ U16_NEXT_UNSAFE

#define U16_NEXT_UNSAFE (   s,
  i,
 
)
Value:
(c)=(s)[(i)++]; \
if(U16_IS_LEAD(c)) { \
(c)=U16_GET_SUPPLEMENTARY((c), (s)[(i)++]); \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_LEAD(c)
Is this code unit a lead surrogate (U+d800..U+dbff)?
Definition: utf16.h:58
#define U16_GET_SUPPLEMENTARY(lead, trail)
Get a supplementary code point value (U+10000..U+10ffff) from its lead and trail surrogates.
Definition: utf16.h:111
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Get a code point from a string at a code point boundary offset, and advance the offset to the next code point boundary.

(Post-incrementing forward iteration.) "Unsafe" macro, assumes well-formed UTF-16.

The offset may point to the lead surrogate unit for a supplementary code point, in which case the macro will read the following trail surrogate as well. If the offset points to a trail surrogate, then that itself will be returned as the code point. The result is undefined if the offset points to a single, unpaired lead surrogate.

Parameters
sconst UChar * string
istring offset
coutput UChar32 variable
See also
U16_NEXT
Stable:
ICU 2.4

Definition at line 280 of file utf16.h.

◆ U16_PREV

#define U16_PREV (   s,
  start,
  i,
 
)
Value:
(c)=(s)[--(i)]; \
if(U16_IS_TRAIL(c)) { \
uint16_t __c2; \
if((i)>(start) && U16_IS_LEAD(__c2=(s)[(i)-1])) { \
--(i); \
(c)=U16_GET_SUPPLEMENTARY(__c2, (c)); \
} \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_LEAD(c)
Is this code unit a lead surrogate (U+d800..U+dbff)?
Definition: utf16.h:58
#define U16_IS_TRAIL(c)
Is this code unit a trail surrogate (U+dc00..U+dfff)?
Definition: utf16.h:66
#define U16_GET_SUPPLEMENTARY(lead, trail)
Get a supplementary code point value (U+10000..U+10ffff) from its lead and trail surrogates.
Definition: utf16.h:111
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Move the string offset from one code point boundary to the previous one and get the code point between them.

(Pre-decrementing backward iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.

The input offset may be the same as the string length. If the offset is behind a trail surrogate unit for a supplementary code point, then the macro will read the preceding lead surrogate as well. If the offset is behind a lead surrogate or behind a single, unpaired trail surrogate, then c is set to that unpaired surrogate.

Parameters
sconst UChar * string
startstarting string offset (usually 0)
istring offset, must be start<i
coutput UChar32 variable
See also
U16_PREV_UNSAFE
Stable:
ICU 2.4

Definition at line 569 of file utf16.h.

◆ U16_PREV_OR_FFFD

#define U16_PREV_OR_FFFD (   s,
  start,
  i,
 
)
Value:
(c)=(s)[--(i)]; \
if(U16_IS_SURROGATE(c)) { \
uint16_t __c2; \
if(U16_IS_SURROGATE_TRAIL(c) && (i)>(start) && U16_IS_LEAD(__c2=(s)[(i)-1])) { \
--(i); \
(c)=U16_GET_SUPPLEMENTARY(__c2, (c)); \
} else { \
(c)=0xfffd; \
} \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_LEAD(c)
Is this code unit a lead surrogate (U+d800..U+dbff)?
Definition: utf16.h:58
#define U16_GET_SUPPLEMENTARY(lead, trail)
Get a supplementary code point value (U+10000..U+10ffff) from its lead and trail surrogates.
Definition: utf16.h:111
#define U16_IS_SURROGATE_TRAIL(c)
Assuming c is a surrogate code point (U16_IS_SURROGATE(c)), is it a trail surrogate?
Definition: utf16.h:92
#define U16_IS_SURROGATE(c)
Is this code unit a surrogate (U+d800..U+dfff)?
Definition: utf16.h:74
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Move the string offset from one code point boundary to the previous one and get the code point between them.

(Pre-decrementing backward iteration.) "Safe" macro, handles unpaired surrogates and checks for string boundaries.

The input offset may be the same as the string length. If the offset is behind a trail surrogate unit for a supplementary code point, then the macro will read the preceding lead surrogate as well. If the offset is behind a lead surrogate or behind a single, unpaired trail surrogate, then c is set to U+FFFD.

Parameters
sconst UChar * string
startstarting string offset (usually 0)
istring offset, must be start<i
coutput UChar32 variable
See also
U16_PREV_UNSAFE
Stable:
ICU 60

Definition at line 600 of file utf16.h.

◆ U16_PREV_UNSAFE

#define U16_PREV_UNSAFE (   s,
  i,
 
)
Value:
(c)=(s)[--(i)]; \
if(U16_IS_TRAIL(c)) { \
(c)=U16_GET_SUPPLEMENTARY((s)[--(i)], (c)); \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_TRAIL(c)
Is this code unit a trail surrogate (U+dc00..U+dfff)?
Definition: utf16.h:66
#define U16_GET_SUPPLEMENTARY(lead, trail)
Get a supplementary code point value (U+10000..U+10ffff) from its lead and trail surrogates.
Definition: utf16.h:111
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Move the string offset from one code point boundary to the previous one and get the code point between them.

(Pre-decrementing backward iteration.) "Unsafe" macro, assumes well-formed UTF-16.

The input offset may be the same as the string length. If the offset is behind a trail surrogate unit for a supplementary code point, then the macro will read the preceding lead surrogate as well. If the offset is behind a lead surrogate, then that itself will be returned as the code point. The result is undefined if the offset is behind a single, unpaired trail surrogate.

Parameters
sconst UChar * string
istring offset
coutput UChar32 variable
See also
U16_PREV
Stable:
ICU 2.4

Definition at line 542 of file utf16.h.

◆ U16_SET_CP_LIMIT

#define U16_SET_CP_LIMIT (   s,
  start,
  i,
  length 
)
Value:
if((start)<(i) && ((i)<(length) || (length)<0) && U16_IS_LEAD((s)[(i)-1]) && U16_IS_TRAIL((s)[i])) { \
++(i); \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_LEAD(c)
Is this code unit a lead surrogate (U+d800..U+dbff)?
Definition: utf16.h:58
#define U16_IS_TRAIL(c)
Is this code unit a trail surrogate (U+dc00..U+dfff)?
Definition: utf16.h:66
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Adjust a random-access offset to a code point boundary after a code point.

If the offset is behind the lead surrogate of a surrogate pair, then the offset is incremented. Otherwise, it is not modified. The input offset may be the same as the string length. "Safe" macro, handles unpaired surrogates and checks for string boundaries.

The length can be negative for a NUL-terminated string.

Parameters
sconst UChar * string
startint32_t starting string offset (usually 0)
iint32_t string offset, start<=i<=length
lengthint32_t string length
See also
U16_SET_CP_LIMIT_UNSAFE
Stable:
ICU 2.4

Definition at line 727 of file utf16.h.

◆ U16_SET_CP_LIMIT_UNSAFE

#define U16_SET_CP_LIMIT_UNSAFE (   s,
 
)
Value:
if(U16_IS_LEAD((s)[(i)-1])) { \
++(i); \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_LEAD(c)
Is this code unit a lead surrogate (U+d800..U+dbff)?
Definition: utf16.h:58
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Adjust a random-access offset to a code point boundary after a code point.

If the offset is behind the lead surrogate of a surrogate pair, then the offset is incremented. Otherwise, it is not modified. The input offset may be the same as the string length. "Unsafe" macro, assumes well-formed UTF-16.

Parameters
sconst UChar * string
istring offset
See also
U16_SET_CP_LIMIT
Stable:
ICU 2.4

Definition at line 704 of file utf16.h.

◆ U16_SET_CP_START

#define U16_SET_CP_START (   s,
  start,
 
)
Value:
if(U16_IS_TRAIL((s)[i]) && (i)>(start) && U16_IS_LEAD((s)[(i)-1])) { \
--(i); \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_LEAD(c)
Is this code unit a lead surrogate (U+d800..U+dbff)?
Definition: utf16.h:58
#define U16_IS_TRAIL(c)
Is this code unit a trail surrogate (U+dc00..U+dfff)?
Definition: utf16.h:66
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Adjust a random-access offset to a code point boundary at the start of a code point.

If the offset points to the trail surrogate of a surrogate pair, then the offset is decremented. Otherwise, it is not modified. "Safe" macro, handles unpaired surrogates and checks for string boundaries.

Parameters
sconst UChar * string
startstarting string offset (usually 0)
istring offset, must be start<=i
See also
U16_SET_CP_START_UNSAFE
Stable:
ICU 2.4

Definition at line 514 of file utf16.h.

◆ U16_SET_CP_START_UNSAFE

#define U16_SET_CP_START_UNSAFE (   s,
 
)
Value:
if(U16_IS_TRAIL((s)[i])) { \
--(i); \
} \
#define UPRV_BLOCK_MACRO_END
Defined as "while (FALSE)" by default.
Definition: umachine.h:177
#define U16_IS_TRAIL(c)
Is this code unit a trail surrogate (U+dc00..U+dfff)?
Definition: utf16.h:66
#define UPRV_BLOCK_MACRO_BEGIN
Defined as the "do" keyword by default.
Definition: umachine.h:168

Adjust a random-access offset to a code point boundary at the start of a code point.

If the offset points to the trail surrogate of a surrogate pair, then the offset is decremented. Otherwise, it is not modified. "Unsafe" macro, assumes well-formed UTF-16.

Parameters
sconst UChar * string
istring offset
See also
U16_SET_CP_START
Stable:
ICU 2.4

Definition at line 494 of file utf16.h.

◆ U16_SURROGATE_OFFSET

#define U16_SURROGATE_OFFSET   ((0xd800<<10UL)+0xdc00-0x10000)

Helper constant for U16_GET_SUPPLEMENTARY.

Internal:
Do not use. This API is for internal use only.

Definition at line 98 of file utf16.h.

◆ U16_TRAIL

#define U16_TRAIL (   supplementary)    (UChar)(((supplementary)&0x3ff)|0xdc00)

Get the trail surrogate (0xdc00..0xdfff) for a supplementary code point (0x10000..0x10ffff).

Parameters
supplementary32-bit code point (U+10000..U+10ffff)
Returns
trail surrogate (U+dc00..U+dfff) for supplementary
Stable:
ICU 2.4

Definition at line 131 of file utf16.h.