This is a work in progress. If you read to the end, you’ll see we almost reached our goal here.
This blog will cover how to use the Globalization Pipeline to translate
uconv, one of ICU’s sample command line apps. We'll be translating the resource
files you can see in
First, Download ICU4C source code (as a tarball or from the SVN repository) and compile it. See its readme for more details.
Now, set up Globalization Pipeline. See our Quick Start Guide for getting your Globalization Pipeline instance created and set up.
In the GP dashboard, create a bundle named
uconv. Select which languages you want to translate into, but don’t upload any strings. Click Save.
Also from the Bluemix dashboard, get the service credentials for your service. Save these in a file called
mycreds.jsonthat should look like the example in this document.
We’ll also need the
gp-clijava tool, so download the latest jar from gp-java-tools.
Now, let's get some translated content
Hm. These files are in icu4c resource format, which isn't (yet?) supported by Globalization Pipeline… directly. Let's try an interim step.
genrb -x root root.txt
genrb -x fr fr.txt
Now we have
fr.xlf (for good measure).
Here's a snippet of
gp-cli tool says it handles XLIFF as a file format. Let's get that set up.
java -jar gp-cli-1.1.0.jar import -b uconv -f root.xlf -l en -t xliff -j mycreds.json
Note that we use the language tag
en for English here, while the file was originally entitled
This is because Globalization Pipeline works with the explicit source language, whereas for ICU,
is what will be consulted as a fallback if no other languages are available.
It says it uploaded… but let’s check in the Globalization Pipeline dashboard:
OK! That’s great. Browsing over to the other language translations, we can see that the MT engines are hard at work. However, we happen to already have some French translations in the ICU source base. We'll upload this, to overwrite some of the Machine-translated entries for French:
java -jar gp-cli-1.1.0.jar import -b uconv -f fr.xlf -l fr -t xliff -j mycreds.json
Great. Now we have some human translated content as well. We can now correct, upload/download content in the dashboard until we are happy with the translations there.
OK, now for the next step- getting those translations back into ICU4C.
We can list the bundle status from the command line:
java -jar gp-cli-1.1.0.jar show-bundle -b uconv -j mycreds.json
Now, we’ll download the files in XLIFF format again:
java -jar gp-cli-1.1.0.jar export -b uconv -f fr.xlf -l fr -t xliff -j mycreds.json
java -jar gp-cli-1.1.0.jar export -b uconv -f es.xlf -l es -t xliff -j mycreds.json
java -jar gp-cli-1.1.0.jar export -b uconv -f de.xlf -l de -t xliff -j mycreds.json
java -jar gp-cli-1.1.0.jar export -b uconv -f zh.xlf -l zh-Hans -t xliff -j mycreds.json
… and so on. Repeat for each language you wish to download. Note that we’ve used
zh for Chinese instead of
OK, we have XLIFF format. How to convert it to ICU format?
genrb only writes XLIFF, it can’t read it.
We need the XLIFF2ICU Converter as is noted here.
To build it, at present, this worked for me:
- download ICU4J source (yes, J)
ant xlifffrom the top level
- you will end up with an
Still with me? Head back to the
uconv/resources directory, and now run:
java -jar xliff.jar -s . -d . fr.xlf
And that brings us to…
Hrm. Seems like the XLIFF output isn't quite ready to be consumed. I filed a bug on this, of course.
We're so close… let's see what we can do. What if we fetch the data in JSON format, and then hack up something to convert it to ICU format? It might suffice for this blog post.
Let's run the fetches again, but get JSON this time:
java -jar gp-cli-1.1.0.jar export -b uconv -f fr.json -l fr -t json -j mycreds.json…
Now, run the following Node.js script over the JSON files:
node js2icu.js fr.json es.json …
You should be the proud owner of
.txt files matching all of the languages you are using.
We're almost there.
Let's go up and build
resfiles.mk and change the
RESSRC line to reference the new translations:
Let’s test it. I know
uwmsg.o isn't really utf-8, that's why this is a test.
env LC_ALL=es ./../../bin/uconv -f utf-8 < uwmsg.o
Looks like we have a (more) translated
Some of the messages don’t quite work correctly due to ICU4C message conventions.
Perhaps we will investigate this in the future.