Redirecting to the About Me page…

Full Stack Language Enablement

This has been a working document for a while. I am publishing it here so that it can serve for more public discussion. Thank you to Co-authors: Anshuman Pandey, Isabelle Zaugg. Also thanks to others have discussed these items over the years such as Martin Raymond.

a “full stack” of resources

Introduction

There are a lot of steps to be taken in order to ensure that a language is fully supported. The objective of this document is to collect the steps needed and begin to plan how to accomplish them for particular languages. The intent is for this to serve as a guide to language community members and other interested parties in how to improve the support for a particular language.

The diagram below shows languages in one axis, and the “stack” of support tasks on the other.

Language vs the “Support Stack”

Coordination is key. Finding and communicating with the right people is often at least as difficult as the technical aspects. ScriptSource can be a good “central hub” to collect/publish information and needs for a user community.

Encoding

A critical step is of course Unicode encoding, but that is only the first step. Also, there can be (through no fault of anyone’s) a long gap between the first contact with a user community and the publication of a Unicode version supporting that language, not to mention other steps. The Script Encoding Initiative at UC Berkeley works closely with language communities working to encode their scripts in Unicode.

In the course of the encoding process, a lot of information is gathered which is relevant to other steps such as grammatical considerations and best practices around font and layout support.

  • Standardizing of the script ideally/typically happens before Unicode inclusion, but sometimes this can hold up Unicode inclusion, or be an ongoing challenge if it is incomplete after Unicode inclusion. Standardization of the script, as well as the orthography, are very helpful for digital vitality in general, as a standardized orthography helps “search” to work well, for example.

Font

Martin Raymond recommends to: “…split drawing [of the] glyphs from making the fonts work. Anyone can draw the glyphs, realize that font designers will work later to connect attachment points etc. later.”

In other words, the critical initial step is to get the correct glyphs from the user community.

Note that there is a need for fonts for different purposes: aesthetic, low resolution, small devices.

Layout

Determine if layout requirements are “complex” or not. (See the “shaping required” field of CLDR Script Metadata).

Support through W3C’s Layout & Typography project: https://www.w3.org/International/layout

  • From website: “The W3C needs to make sure that the text layout and typographic needs of scripts and languages around the world are built in to technologies such as HTML, CSS, SVG, etc. so that Web pages and eBooks can look and behave as people expect around the world.”

OS-level support

  • Desktop support
  • Mobile support (possibly even more important than desktop for global minority scripts)

Input

  • Keyboard
    • Virtual keyboards for mobile devices
    • Managing repertoire (Unihan, etc)
    • Transliteration standard into Latin script (This is helpful for input when a keyboard supporting the target script is unavailable.)

Locale Data

CLDR seed/initial data (see CLDR Minimal Data)

  • Needed: an app to collect initial data (a true “Survey Tool”)
  • Within CLDR: Promote from “seed” to “common” as data matures
  • Verify deployment (inclusion in JSON data, ICU, Globalize, etc.)
    • Code changes may be needed, such as calendar and new date/time support, line breaking, etc.

Software Translation

Advanced NLP (Natural Language Processing)

The development of many NLP applications requires large digital corpora, the collection of which is a project in itself. Even when corpora are collected, say through web crawling, when they are not available publicly, other developers cannot benefit from them as a resource. Therefore, a freely available repository of digital resources in a target language, to which contributors can add, is an ideal first step for the following efforts.

  • OCR
  • Spell checking
  • Auto-correction, Auto-suggestion, Auto-fill
  • Parsing & Stemming (helps search to happen with related terms)
  • Language glossaries/dictionaries/thesauri
  • Search capacity within word documents & pdfs
  • Translation: Ideally not just dominant language to minority language, but also minority to minority language (for maximum use within countries that enjoy a high level of language diversity)
  • Natural language queries and conversation

App-Level Support

This means going beyond:

  • Multilingual readiness (Unicode support: “Don’t garble my text”)
  • Leverage locale data and implementations (ICU, etc.)
  • Translation (above)

…to truly supporting language specific features. Some examples:

  • Arabic and East Asian advanced typography
  • NLP support as above

ICANN / IDN support

Support for a script within top-level domains allows an important level of localization online that breaks from the historically Latin-only top level domains and reflects the truly international nature of the Internet. ICANN has made significant progress in this area, and is currently in the process of working with language communities to define rules for using many new scripts in TLDs (top level domains).

Computer programming language in mother tongue

While this may seem a far-fetched dream today, the fact that programming languages are in English is a barrier to the full use of digital tools by much of the world’s population. This might be the final frontier for the internationalization/localization of digital technologies. “قلب” is an example of a programming language entirely in Arabic.

References

Announcing 🌲 pino-couch

latest on npm

Today I’m pleased to release pino-couch. You can find it on GitHub under https://github.com/IBM/pino-couch.

This little module is a transport which lets you capture your pino logs into any CouchDB database.

Why pino?

  • Speed: I haven’t independently tested the benchmarks, but I really like logging that doesn’t slow down the application. I want to be able to sprinkle logging generously in the application without slowing it down.

  • Simplicity: Take a look at the example below. We go from logging to the console, to logging in a database. The configuration and execution of log processing is entirely outside of the application.

  • Sticker: Because it has a logo that looks nice on a hex sticker? OK, not really. But @matteocollina presented this logger so effectively at NodeSummit, I asked for a sticker. Today, I’m glad to give something back to the community.

Taking it for a spin

First steps with pino

Let’s do a quick demo here, with a simple app that emits some logs:

1
2
3
$ cd somewhere
$ npm init -q -y
$ npm install --save pino

And for index.js:

1
2
3
4
5
const pino = require('pino')();
pino.error('Something bad happened!');
pino.warn({ iToldYou: [ 'once', 'twice', 'thrice' ]});
pino.info({ msg: "Hey, check out these versions", versions: require('process').versions });
pino.trace('ALL THE DETAILS');

With the nice pino API you have lots of options for emitting logs.

1
2
3
4
$ node index.js
{"pid":54534,"hostname":"filfla.local","level":50,"time":1496436803976,"msg":"Something bad happened!","v":1}
{"pid":54534,"hostname":"filfla.local","level":40,"time":1496436803979,"iToldYou":["once","twice","thrice"],"v":1}
{"pid":54534,"hostname":"filfla.local","level":30,"time":1496436803979,"msg":"Hey, check out these versions","versions":{"http_parser":"2.7.0","node":"8.0.0","v8":"5.8.283.41","uv":"1.11.0","zlib":"1.2.11","ares":"1.10.1-DEV","modules":"57","openssl":"1.0.2k","icu":"59.1","unicode":"9.0","cldr":"31.0.1","tz":"2017b"},"v":1}

Notice the trace() details were below the current level, so were omitted. This is detailed, but not super readable. If you are running something from the commadn line, the pino global utility tidies up the output nicely—in color, even, if your console supports it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
$ npm install -g pino
$ node index.js | pino
[2017-06-02T20:56:12.125Z] ERROR (56035 on filfla.local): Something bad happened!
[2017-06-02T20:56:12.128Z] WARN (56035 on filfla.local):
iToldYou: [
"once",
"twice",
"thrice"
]
[2017-06-02T20:56:12.128Z] INFO (56035 on filfla.local): Hey, check out these versions
versions: {
"http_parser": "2.7.0",
"node": "8.0.0",
"v8": "5.8.283.41",
"uv": "1.11.0",
"zlib": "1.2.11",
"ares": "1.10.1-DEV",
"modules": "57",
"openssl": "1.0.2k",
"icu": "59.1",
"unicode": "9.0",
"cldr": "31.0.1",
"tz": "2017b"
}

Persistence without Perspiration: Relax!

Here’s where pino-couch comes in. I’m going to set up a https://cloudant.com database to store these logs (as I do in production), but you can also use a local or any other couchdb instance (as I do when developing locally).

  • First, create a database
createdb.png
  • Next, give appropriate permissions.

pino-couch only needs to write to the database, it doesn’t need to read. Click the Permissions tab, then Generate API Key. Choose only the _writer column for our new API key.

permissdb.png

That’s actually it for configuration.

  • Start up our app, but using pino-couch. Use the APIKEY and PASSWORD that were generated above. And of course, your own ACCOUNT.
1
2
3
4
$ npm install -g pino-couch
$ node index.js | pino-couch -U https://APIKEY:PASSWORD@ACCOUNT.cloudant.com -d slogging | pino
[2017-06-02T21:16:22.511Z] ERROR (68283 on filfla.local): Something bad happened!
( etc… )
  • The output is about the same. We chained on a | pino at the end to keep the output human-readable— that's optional.

Let’s take a look at the Cloudant dashboard again:

readdb.png

There’s our data!

So now what?

Here are a couple of things you might do with your new logging pipeline:

Write a clever design document to mine your app logs for important stuff.

Even something as simple as the following will get you timestamp-ordered documents.

1
2
3
function (doc) {
emit([new Date(doc.time).toISOString(),doc._id], doc.msg);
}

Note that besides the time field with epoch time, hostname contains the current hostname. This is really useful for distinguishing logs from among a cluster of servers.

Ingest the data into Elasticsearch/Kibana

We’ve done this with great success. We were already pulling from another Cloudant DB, so it was easy to add the application logs.

And of course:

Relax

🌲: +1!

Translating Kibana with the Globalization Pipeline

Introduction

kibana0.png

This post (and video) will explain how to translate Kibana using the Globalization Pipeline service on Bluemix. Note that some of the steps shown here depend on kibana:8766 which was not merged as this article went to press. (Portions are based on the development-internationalization.asciidoc document from that PR.)

Prerequisites

Setting up Globalization Pipeline

  • Follow the GP Quick Start Guide to create a service instance. Copy down the "credentials" into a new file, gp-credentials.json which should look something like the following:
1
2
3
4
5
6
{
"url": "https://gp-rest.bluemix.example.com/translate/rest",
"userId": "c2lnbiB1cCBAIGJsdWVtaXgubmV0ISEK",
"password": "aHVudGVyNDIK",
"instanceId": "aHR0cHM6Ly9ibHVlbWl4Lm5ldCA8PDwK"
}
  • Create the bundle on the GP instance. The example below uses English (en) as the source langage and requests Spanish, Japanese, and French targets (es,ja,fr).
1
2
$ java -jar {wherever}/gp-cli.jar create-bundle -j {wherever}/gp-credentials.json -b 'kibana_core' -l en,es,ja,fr
A new bundle 'kibana_core' was successfully created.
  • The bundle will show up in the Bluemix dashboard under the service’s console, but as empty.

  • We are going to translate the src/core_plugins/kibana/translations/en.json file in Kibana. Upload that file to the Globalization Pipeline service using the command line:

1
2
3
$ cd ~/src/kibana
$ java -jar {wherever}/gp-cli.jar import -j {wherever}/gp-credentials.json -b 'kibana_core' -l en -f src/core_plugins/kibana/translations/en.json -t json
Resource data extracted from src/core_plugins/kibana/translations/en.json was successfully imported to bundle:kibana_core, language:en
  • If you head back over to the Bluemix dashboard, you can now see the populated bundle with translated content:
gp-dash.png

What you see was done with machine translation, hence the red “U” (Unreviewed). The content here can be corrected manually by clicking the Pencil icon, or marked as manually reviewed by clicking the Checkmark. It’s also possible to download the translated content for offline review or use, or to upload a corrected version of one of the translations.

Head back over to the command line, though, because it is time to create our plugin.

Creating the plugin

Something like this:

1
2
3
4
5
6
7
8
9
10
$ npm install -g yo generator-kibana-plugin
Everything looks all right!
$ yo kibana-plugin
? Your Plugin Name gp srl kibana plugin
? Short Description An awesome Kibana translation plugin
? Target Kibana Version master
I'm all done. Running npm install for you to install the required dependencies. If this fails, try running the command yourself.
  • You will notice that the generator has created a translations/es.json file. We will replace this with our translated content.
1
$ rm translations/es.json
  • Now, download the translated content into the correct files:
1
2
3
4
5
6
7
8
$ java -jar {wherever}/gp-cli.jar export -j {wherever}/gp-credentials.json -b 'kibana_core' -t json -l es -f translations/es.json
Resource data exported from bundle:kibana_core, language: es was successfully saved to file translations/es.json
$ java -jar {wherever}/gp-cli.jar export -j {wherever}/gp-credentials.json -b 'kibana_core' -t json -l fr -f translations/fr.json
Resource data exported from bundle:kibana_core, language: fr was successfully saved to file translations/fr.json
$ java -jar {wherever}/gp-cli.jar export -j {wherever}/gp-credentials.json -b 'kibana_core' -t json -l ja -f translations/ja.json
Resource data exported from bundle:kibana_core, language: ja was successfully saved to file translations/ja.json
  • Update the index.js file in the plugin to mention the updated translations files.

You will see a section like this:

1
2
3
translations: [
resolve(__dirname, './translations/es.json')
],

Change it to mention all of the language files we have just downloaded:

1
2
3
4
5
translations: [
resolve(__dirname, './translations/es.json'),
resolve(__dirname, './translations/ja.json'),
resolve(__dirname, './translations/fr.json')
],
  • That's all the coding we'll need for today…

  • Copy your entire translations plugin directory to the Kibana plugins (<kibana_root>/plugins/) directory

Trying it out

Fire up Kibana and you should see the translated content!

kibana1.png

More steps

  • By the way, French isn’t included in the video or images becuase I ran into kibana#10580 during the production of this video. When this is fixed I will come back and edit this video, but until then, beware single quotes (') in your translated strings.

  • Note that if you repeat the import and export steps of the gp-cli tool, the Globalization Pipeline will automatically manage translation changes if, for example, translated keys are added or removed, or translated content changes.

  • Follow the progress of Kibana Globalization on Github: (kibana#6515).

  • Read more about Globalization Pipeline

  • Connect with the Globalization Pipeline Open Source Community

Acknowledgements

  • Thanks to fellow IBMers Martin Hickey, Shikha Srivastava, and Jonathan Lo for the Kibana G11n work (kibana#6515), also the elastic/kibana team for being a great OSS community, and last but not least the entire Globalization Pipeline team.

Literate Programmers

Besides the Globalization Pipeline mug, one of my favorite coffee mugs says:

03: MAKE IT POSSIBLE FOR PROGRAMMERS TO WRITE IN ENGLISH AND YOU WILL FIND OUT THAT PROGRAMMERS CANNOT WRITE IN ENGLISH.

On the serious side, we need to emphasize communication skills in the technology industry. Even if I have a great idea, if I can’t communicate it, it will go nowhere. And neither will I.

Just to be clear, by “communication” I mean “talking with other humans”. Which brings me to today’s topic on the lighter side, and that is the overloading of English. Words such as function, overload, network, build all have specific meanings that weren’t originally found in Webster’s. The 1828 definition of computer, for example, is:

One who computes; a reckoner; a calculator.

In i18n, there are other words that have very specific meanings: global, globalization, collation, contraction, and of course locale, just to name a few.

To that end, I have started to add some tongue-in-cheek “redefinitions” to the bottom of the blog just to remind us all that these words have non-software meanings.

If you want to see them all without hitting reload an infinite number of times, you can see the original source here.

Speaking of i18n, this overloading doesn’t apply to English only. Most of my devices are set to es-US as their locale, so I see a lot of translated error message. gcc for example has a thriving translation project where dedicated persons cause “English” to be translated into, for example, “Spanish” such as:

#~ msgid "function ‘%D’ declared overloaded, but no definitions appear with which to resolve it?!?"

#~ msgstr "¿!¿se declaró la función ‘%D’ sobrecargada, pero no aparece ninguna definición con la cual resolverlo?!?"

Not sure why that’s ¿!¿ where I might expect ¿¡¿ — perhaps the initial ! just shows the compiler’s incredulity. In any event, sobrecargada seems to be a great cognate for overloaded. And with that, I will let you goto whatever you were doing before you started reading.

PR’s are welcome on my little list, or leave comments below. What are your favorite examples of overloaded terms, in any language?

Fallbacks in ICU4C Converters

Unicode’s ICU version 59 is well underway at this point. While ideally everything would use Unicode, there still remains many systems — and much content — that is in non-Unicode encodings. For this reason, ICU, in both the C/C++ and the Java flavors, has rich support for codepage conversion.

One of many great features in ICU is the callback support. A lot can go wrong during codepage conversion, but in ICU, you can control what happens during exceptional situations.

Let’s try a simple sample. By the way, see the end of this post for hints on compiling the samples.

Substitute, Always

Our task is to convert black-bird (but with a U+00AD, “Soft Hyphen” in between the two words) to ASCII.

substituteTest-0.cppview raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <unicode/utypes.h>
#include <unicode/ustdio.h>
#include <unicode/ucnv.h>
int main(int /*argc*/, const char * /*argv*/ []) {
UErrorCode status=U_ZERO_ERROR;
LocalUConverterPointer cnv(ucnv_open("us-ascii", &status));
if(U_FAILURE(status)) {
u_printf("Error opening: %s\n", u_errorName(status));
return 1;
}
UnicodeString str("black-bird");
str.setCharAt(5, 0x00AD); // soft hyphen
const UChar *uch = str.getTerminatedBuffer();
u_printf("Input String: %S length %d\n", uch, str.length());
char bytes[1024];
int32_t bytesWritten =
ucnv_fromUChars(cnv.getAlias(), bytes, 1024, uch, -1, &status);
if(U_FAILURE(status)) {
u_printf("Error converting: %s\n", u_errorName(status));
return 1;
}
u_printf("Converted %d bytes\n", bytesWritten);
for(int32_t i=0; i<bytesWritten; i++) {
u_printf("\\x%02X ", bytes[i]&0xFF);
}
u_printf("\n");
// try to print it out on the console
bytes[bytesWritten]=0; // terminate it first
puts(bytes);
return 0; // LocalUConverterPointer will cleanup cnv
}

Output:

1
2
3
4
Input String: black­bird length 10
Converted 9 bytes
\x62 \x6C \x61 \x63 \x6B \x62 \x69 \x72 \x64
blackbird

Hm. Ten characters in, nine out. What happened? Well, U+00AD is not a part of ASCII. ASCII is a seven bit encoding, thus only maps code points \x00 through \x7F inclusively. Furthermore, U+00AD is Default Ignorable, and as of ICU 54.1 (2014) in #10551, the soft hyphen can just be dropped.

But what if, for some reason, you don’t want the soft hyphen dropped? The pre ICU 54.1 behavior can be brought back easily with a custom call back. So, roll up your collective sleeves, and:

alwaysSubstitute.hview raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// © 2016 and later: Unicode, Inc. and others.
// License & terms of use: http://www.unicode.org/copyright.html
#include <unicode/ucnv.h>
#include <unicode/ucnv_err.h>
#include <unicode/ucnv_cb.h>
/**
* This is a modified version of ICU’s UCNV_FROM_U_CALLBACK_SUBSTITUTE
* it unconditionally substitutes on irregular codepoints.
*
* Usage:
* ucnv_setFromUCallBack(c, UCNV_FROM_U_CALLBACK_SUBSTITUTE_ALWAYS, NULL, NULL, NULL, &status);
*/
U_CAPI void U_EXPORT2
UCNV_FROM_U_CALLBACK_SUBSTITUTE_ALWAYS (
const void *context,
UConverterFromUnicodeArgs *fromArgs,
const UChar* codeUnits,
int32_t length,
UChar32 codePoint,
UConverterCallbackReason reason,
UErrorCode * err)
{
(void)codeUnits;
(void)length;
if (reason <= UCNV_IRREGULAR) {
*err = U_ZERO_ERROR;
ucnv_cbFromUWriteSub(fromArgs, 0, err);
/* else the caller must have set the error code accordingly. */
}
/* else ignore the reset, close and clone calls. */
}

If we #include this little header, and set it on the converter before we convert…

1
2
3
LocalUConverterPointer cnv(ucnv_open("us-ascii", &status));
ucnv_setFromUCallBack(cnv.getAlias(), UCNV_FROM_U_CALLBACK_SUBSTITUTE_ALWAYS, NULL, NULL, NULL, &status);

… we get the following result:

1
2
3
4
Input String: black­bird length 10
Converted 10 bytes
\x62 \x6C \x61 \x63 \x6B \x1A \x62 \x69 \x72 \x64
black?bird

Great! Now, we are getting \x1A (ASCII SUB). It works.

When missing goes missing

A related question to the above has to do with converting from codepage to Unicode. That’s a better direction anyway. Convert to Unicode and stay there! One can hope. In any event…

For this task, we will convert 0x61, 0x80, 0x94, 0x4c, 0xea, 0xe5 from Shift-JIS to Unicode.

substituteTest-2.cppview raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <unicode/utypes.h>
#include <unicode/ustdio.h>
#include <unicode/ucnv.h>
int main(int /*argc*/, const char * /*argv*/ []) {
UErrorCode status=U_ZERO_ERROR;
LocalUConverterPointer cnv(ucnv_open("shift-jis", &status));
if(U_FAILURE(status)) {
u_printf("Error opening: %s\n", u_errorName(status));
return 1;
}
#define NRBYTES 6
const uint8_t bytes[NRBYTES] = { 0x61, 0x80, 0x94, 0x4c, 0xea, 0xe5 };
u_printf("Input Bytes: length %d\n", NRBYTES);
#define NRUCHARS 50
UChar uchars[NRUCHARS];
int32_t ucharsRead =
ucnv_toUChars(cnv.getAlias(), uchars, NRUCHARS, (const char*)bytes, NRBYTES, &status);
if(U_FAILURE(status)) {
u_printf("Error converting: %s\n", u_errorName(status));
return 1;
}
u_printf("Converted %d uchars\n", ucharsRead);
for(int32_t i=0; i<ucharsRead; i++) {
u_printf("U+%04X ", uchars[i]);
}
u_printf("\n");
// try to print it out on the console
u_printf("Or string: '%S'\n", uchars);
return 0; // LocalUConverterPointer will cleanup cnv
}

Output:

1
2
3
4
Input Bytes: length 6
Converted 4 uchars
U+0061 U+001A U+732B U+FFFD
Or string: 'a猫�'

So, the letter "a" byte \x61 turned into U+0061, and then we have an illegal byte \x80 which turned into U+001A. Next, the valid sequence \x94 \x4c turns into U+732B which is 猫 (“cat”). Finally, the unmapped sequence \xea \xe5 turns into U+FFFD. Notice that the single byte illegal sequence turned into (SUB, U+001A), but the two byte sequence turned into U+FFFD. This is discussed somewhat here.

So far so good?

But what if you actually want U+FFFD as the substitution character for both sequences? This would be unexpected, but perhaps you have code that is particularly looking for U+FFFDs. We can write a similar callback:

alwaysFFFD.hview raw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// © 2016 and later: Unicode, Inc. and others.
// License & terms of use: http://www.unicode.org/copyright.html
#include <unicode/ucnv.h>
#include <unicode/ucnv_err.h>
#include <unicode/ucnv_cb.h>
static const UChar kFFFD[] = { 0xFFFD };
/**
* This is a modified version of ICU’s UCNV_TO_U_CALLBACK_SUBSTITUTE
* it unconditionally substitutes U+FFFD.
*
* Usage:
* ucnv_setToUCallBack(c, UCNV_TO_U_CALLBACK_SUBSTITUTE_FFFD, NULL, NULL, NULL, &status);
*/
U_CAPI void U_EXPORT2
UCNV_TO_U_CALLBACK_SUBSTITUTE_FFFD (
const void *context,
UConverterToUnicodeArgs *toArgs,
const char* codeUnits,
int32_t length,
UConverterCallbackReason reason,
UErrorCode * err)
{
(void)codeUnits;
(void)length;
if (reason <= UCNV_IRREGULAR)
{
*err = U_ZERO_ERROR;
ucnv_cbToUWriteUChars(toArgs, kFFFD, 1, NULL, err);
// see ucnv_cbToUWriteSub()
}
/* else ignore the reset, close and clone calls. */
}

Let’s hook it up, as before:

1
2
3
LocalUConverterPointer cnv(ucnv_open("shift-jis", &status));
ucnv_setToUCallBack(cnv.getAlias(), UCNV_TO_U_CALLBACK_SUBSTITUTE_FFFD, NULL, NULL, NULL, &status);

And drumroll please…

1
2
3
4
Input Bytes: length 6
Converted 4 uchars
U+0061 U+FFFD U+732B U+FFFD
Or string: 'a�猫�'

Garbage out never looked so good…


Building (or, nothing-up-my-sleeve)

To build these little snippets, I recommend the shell script icurun

If ICU is already installed in your appropriate paths, (visible to pkg-config or at least icu-config), you can simply run:

1
icurun some-great-app.cpp

… and icurun will compile and run a one-off.

If, however, you’ve built ICU yourself in some directory, you can instead use:

1
icurun -i path/to/your/icu some-great-app.cpp

… where path/to/you/icu is the full path to an ICU build or install directory.

If you are on windows… well, there isn’t a powershell version yet. Contributions welcome!

Globalization Pipeline for iOS

Yesterday we just tagged v1.0 of the Globalization Pipeline SDK for iOS. What can an iOS client do? Well, let’s build a simple app and find out.

Starting Out

First, I’ll launch XCode 8 and create a new workspace.

While that is launching, I’ll warn you that your author is only a recent graduate of the Swift playground, who once deployed some toy apps to a then-new iPhone 3GS. So, it’s been a while. Any suggestions for improvement are welcome. The actual SDK, however, was a team effort.

Today’s app will be a color mixer, to help artists mix their colors. You know, red and blue makes purple, and so on.

00_title.png

I will name the workspace gp-ios-color-mixer, and create a new single view app called GP Color Mixer. To simplify things, for now, I disable the checkbox “automatically manage signing.”

01_singleview.png

I want to include the new SDK. I’ll use Carthage to install it. Since I already have Homebrew installed, I only need to do

$ brew install carthage

Now I need a Cartfile that mentions the SDK. So I create one at the same level as my XCode project, containing:

github "IBM-Bluemix/gp-ios-client"

Following the Carthage instructions, I next run

$ carthage update

which results in

*** Fetching gp-ios-client
*** Checking out gp-ios-client at "v1.0"
*** xcodebuild output can be found in /var/folders/j9/yn_32djn36x4d4c2mvcr1kgm0000gn/T/carthage-xcodebuild.p2nKN2.log
*** Building scheme "GPSDK" in TestFramework.xcworkspace

So far so good. Looking in the Finder, I now have GPSDK.framework right where I expect.

02_framework.png

I’ll add it under “Linked frameworks and Libraries”.

03_linked.png

We also need to make sure the framework is available at runtime. To do that, we add a build phase with a one-line script: /usr/local/bin/carthage copy-frameworks with a single input file - $(SRCROOT)/Carthage/Build/iOS/GPSDK.framework

04_buildphase.png

Will it build? I add this to the top of my generated ViewController.swift:

1
import GPSDK

I mentioned turning off code signing, but I still ran into some odd warnings:

1
2
3
4
5
A shell task (/usr/bin/xcrun codesign --force --sign - --preserve-metadata=identifier,entitlements "/Users/srl/Library/Developer/Xcode/DerivedData/gp-ios-color-mixer-evyxcmilwuakdmdvxqqpmmnzisnn/Build/Products/Debug-iphonesimulator/GP Color Mixer.app/Frameworks/GPSDK.framework") failed with exit code 1:
/Users/srl/Library/Developer/Xcode/DerivedData/gp-ios-color-mixer-evyxcmilwuakdmdvxqqpmmnzisnn/Build/Products/Debug-iphonesimulator/GP Color Mixer.app/Frameworks/GPSDK.framework: replacing existing signature
/Users/srl/Library/Developer/Xcode/DerivedData/gp-ios-color-mixer-evyxcmilwuakdmdvxqqpmmnzisnn/Build/Products/Debug-iphonesimulator/GP Color Mixer.app/Frameworks/GPSDK.framework: resource fork, Finder information, or similar detritus not allowed
Command /bin/sh failed with exit code 1

Following QA1940 I was able to make some progress by running xattr -cr './Carthage/Build/iOS/GPSDK.framework'. Now, ⌘R Run rewards me with a blank app window and no errors. Let’s write some code!

Applying myself to the App

By code, of course, I mean a trip to the storyboard. Let's add a launch icon, because we can.

Now, I add some static fields, two picker views (for the input colors), and a button for action.

Starting to look like an app…

I wrote Color.swift to handle the color mixing. It will only support mixing from three of the primary colors - Red, Yellow, Blue. Any other mixing turns into muddy brown. Playground tested, ready to go.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
enum Color : Int {
case red = 0, orange, yellow, green, blue, purple, muddy;
// r+y = o, y+b = g, b+r = p
func simpleDescription() -> String {
switch self {
case .red: return "red"
case .orange: return "orange"
case .yellow: return "yellow"
case .green: return "green"
case .blue: return "blue"
case .purple: return "purple"
case .muddy: return "muddy brown" // use this if we don't know how to mix a color
// should be exhaustive
}
}
/**
* Mix the colors, return the result
*/
func mix( with: Color ) -> Color {
if( self == .muddy || with == .muddy ) {
return .muddy // anything + mud = mud
}
if( with == self ) {
return self // identity!
}
switch self {
case .red:
switch with {
case .yellow: return .orange
case .blue: return .purple
default: return .muddy
}
case .yellow:
switch with {
case .red: return .orange
case .blue: return .green
default: return .muddy
}
case .blue:
switch with {
case .red: return .purple
case .yellow: return .green
default: return .muddy
}
default: return .muddy
}
}
}

Time to wire it up. We create IBOutlets for each of the items. And, I’ll clear the result label just to verify that things are wired up. It runs OK, good.

Wired for sound

Now, let’s set up the delegate stuff so that we can get the list of colors showing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class ViewController: UIViewController, UIPickerViewDelegate, UIPickerViewDataSource {
// …
// pickerview stuff
func numberOfComponents(in pickerView: UIPickerView) -> Int {
return 1;
}
func pickerView(_ pickerView: UIPickerView, numberOfRowsInComponent component: Int) -> Int {
return 3;
}
let primaryColors = [ Color.red, Color.blue, Color.yellow ]
func pickerView(_ pickerView: UIPickerView, titleForRow row: Int, forComponent component: Int) -> String? {
return primaryColors[row].simpleDescription()
}

Hey, just a little more code and we’re feature complete!

1
2
3
4
5
6
7
8
@IBAction func doMix(_ sender: Any) {
let color1 = primaryColors[mixOne.selectedRow(inComponent: 0)]
let color2 = primaryColors[mixTwo.selectedRow(inComponent: 0)]
let newColor = color1.mix(with: color2)
resultLabel.text = newColor.simpleDescription()
}
Works in English… Ship it (just kidding)

At least, feature complete in English.

I’ll next take stock of the resource strings we need to have translated, so that we can run them through the Globalization Pipeline. I’ll call this gp-color-mixer.json

1
2
3
4
5
6
7
8
9
10
11
{
"red": "red",
"orange": "orange",
"yellow": "yellow",
"green": "green",
"blue": "blue",
"purple": "purple",
"muddy brown": "muddy brown",
"title": "Color Mixer",
"mix": "Mix"
}

Mixing the Blue

Time to fire up Bluemix. We are going to basically follow the Globalization Pipeline Quick Start Guide for the this part, which I will refer to.

10_service.png

First, I create an instance of the Globalization Pipeline. The name you give the instance doesn’t matter here.

11_instance.png

Now I create a bundle named gp-color-mixer. This name does matter, as our iOS app will use it to access the content.

12_bundle.png

I’ll Upload the gp-color-mixer.json file above as the source English content, choosing JSON format for the upload. I pick a few languages for the target.

If I view the bundle, I can see our strings there, as well as translated versions.

The Globalization Pipeline offers this web UI to manage content, as well as powerful REST APIs for managing the translation workflow. I need to grant access to the iOS app so that it can read but not modify the translations. So, switching over to the API Users tab…

14_api.png

The result of creating the API user is that some access information is shown, something like the following:

API User ID: 5726d656c6f6e7761746572
Password: aHVudGVyNDIK
Instance ID: 77617465726d656c6f6e77617465726d
URL: https://something.something.bluemix.net/something/something

I take these and plug them into a new swift file named ReaderCredentials.swift like so: (this is a variant of ReaderCredentials-SAMPLE.swift in the SDK’s repo)

1
2
3
4
5
6
7
struct ReaderCredentials {
static let userId = "5726d656c6f6e7761746572";
static let password = "aHVudGVyNDIK";
static let instanceId = "77617465726d656c6f6e77617465726d";
static let url = "https://something.something.bluemix.net/something/something";
static let bundleId = "gp-color-mixer";
}

(Now, after putting my actual credentials in, and a brief offscreen struggle with .gitignore, I move on…)

Putting it all together

I’m almost done.

First, in the ViewController.swift, we initialize the GP service and start setting up a few UI items:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
let gp = GPService()
func get(key: String) -> String {
return gp.localizedString(key, nil)
}
func get(color: Color) -> String {
return get(key: color.simpleDescription())
}
override func viewDidLoad() {
super.viewDidLoad()
resultLabel.text = "Loading…"
do {
try gp.initService(url: ReaderCredentials.url,
instanceId: ReaderCredentials.instanceId,
bundleId: ReaderCredentials.bundleId,
userId: ReaderCredentials.userId,
password: ReaderCredentials.password,
languageId:nil,
alwaysLoadFromServer: false,
expireAfter: 0)
// set up strings
titleLabel.text = get(key: "title")
mixButton.setTitle(get(key: "mix"), for: UIControlState.normal)
mixButton.titleLabel?.text = get(key: "mix")
resultLabel.text = "" // clear this
} catch GPService.GPError.languageNotSupported {
resultLabel.text = ("This language is not supported...")
} catch GPService.GPError.requestServerError(let errorDescription) {
resultLabel.text = ("Request server error: " + errorDescription)
} catch GPService.GPError.HTTPError(let statusCode) {
resultLabel.text = ("Request server error: HTTP \(statusCode)")
} catch {
resultLabel.text = ("Other error")
}
}

Here we set up the service with our credentials. Then, we use our new get(key: ) function to set the title and mix button’s label.

There is also a get(color: ) variant that will translate one of our Color objects. So we use that for the actual mixing function:

1
2
3
@IBAction func doMix(_ sender: Any) {
resultLabel.text = get(color: newColor)

Similarly, we can get the UIPickerView to use localized color names by using this same function:

1
2
3
func pickerView(_ pickerView: UIPickerView, titleForRow row: Int, forComponent component: Int) -> String? {
return get(color: primaryColors[row])
}

Looks good!

Now we can ship it… to the world!

Conclusion

The iOS app will pick up changes if the translated content changes on the server. We could experiment with adding or removing languages, or updating translated keys.

You can find the source code at https://github.com/srl295/gp-ios-color-mixer.

Let me know if this works for you. This is my first post, and as I mentioned first app, in Swift so that’s a milestone. And, do let me know if^H^H what can be done to improve the sample app.

Thanks! Now go and make it global.

GP Client for JavaScript v1.3.0 released

It’s time for a refresh on the Globalization Pipeline Node.js client. I’ve just released v1.3.0 of this SDK. You can update your package.json the usual way, with npm install --save g11n-pipeline

npm version

I managed to close about 13 issues since v1.2.x

Quality

I was able to increase function coverage to 100% thanks to the VSCode coverage plugin, and increase line coverage to 91%. Of course, when you test, you find bugs. Bugs such as realizing that updateResourceStrings() was unusable because there was no way to pass the languageId parameter.

Features

First of all, I synchronized the client with the latest current REST API. So take a peek at the docs again and see if there are any new features or fields.

I also tried to add some convenience functions. For example, getting the full list of language IDs supported used to require concatenating the source and target lists. Now, with #40 you can call .languages() on the Bundle object and it will build this list for you. There is also a bundle.entries() accessor as of #14 which returns ResourceEntry objects.

Speaking of convenience, most places where you used to call .someFunction({}, function callback(…){}); the {} are optional. If it worked with {} before, it's now optional.

The sample PR where I updated the sample code shows some of the code improvements.

There are more features to add here, but I hope you like the changes in v1.3.0!

40th Internationalization and Unicode Conference

I'll start, and could almost end, my post with this tweet:

Yes, that exactly. November 1-3, 2016 was the 40th Unicode conference. They used to be twice a year, in multiple locations, as in, outside of Santa Clara, California, USA.

Now that the conference is over, I’ll have to take some time to view slides from all of the other great presentations I missed while giving a personal record number of talks (long story), apart from the lightning talks which were apparently not recorded.

The conference, and Unicode in general, is about people. It is always great to see so many folks I've kept up with over the years… including of course my fellow IBMers from many time zones away.

Off the top of my head, the important technical (besides personal) conversations I've had include:

Next week : IBM is hosting UTC 149 !

gp-angular-client v1.2.0

I just pushed out version 1.2.0 of our Angular Client for Globalization Pipeline to the usual places. gp-angular-client on bower, angular-g11n-pipeline on npm.

Bower version
npm version

Thanks to IBMer @ckoberlein (GitHub) this SDK now supports variable substitution. So you can have a string such as Hello and translate and substitute this same string, so that for example in Spanish it will be Bienvenidos . So, output would be Hello Steven or Bienvenidos Steven depending on language.

More details on our README and be sure to connect with us over on developerWorks Open!