Here’s how it works. Names have been changed to protect the author of this pr.
Let’s say you want an array class. Not just any array, a special array.
1 | import "node:console"; |
Not a lot to see here. It's a special array class, we have a special constructor to get the data just right.
1 | const q = new SplitArray("this-and-that", "-"); |
So far so good. Seems to be working as intended, we can add all sorts of other functions and such.
Let’s add some debugging. Why not.
1 | class SplitArray { |
A little excessive, but mostly harmless. (right?)
1 | … |
Huh? Line 6 is the constructor. Why are we constructing this object, yet again? I was just trying to add a little toString()…
Maybe the map
function is too complicated. Let’s just change it to this.map(q => q)
to see if it works at all. Nope.
Debugging shows that strs
has the value of 3
. As in, the number.
Array.prototype.map(), and I quote,
creates a new array populated with the results of calling a provided function on every element in the calling array
That’s what we’re trying to do, alright.
The problem here is that my little class doesn’t just have an array, it is an array. And so it is expected to act like one.
The Array()
constructor has a few overloads:
arrayLength… such as, perhaps, 3?
The call to super()
should have been a clue here. What’s happening internally is that map(q => q) is doing something like:
n = new SplitArray(3); n[0] = q[0]; n[1] = q[1]; n[2] = q[2];
That's all fine, except for the constructor.
To fix this, we can use a factory static instead of our “helpful” constructor, which turned out to be less than helpful to the map()
1 | class SplitArray extends Array<String> { |
Now, finally, toString()
will work properly. For some definition of properly.
1 | q.toString() |
We also could have contained the Array as a property, instead of subclassing.
The tl;dr (now that you’ve perhaps already read it) is: Use caution when overriding Array constructors. Consider having an array instead of being one.
]]>The purpose of this post is to give some updated progress on the status of implementing CLDR (LDML format) keyboards in SIL Keyman.
In Keyman, there’s a sample XML file named basic.xml
. It’s not a “real” keyboard, but instead a unit test file. In fact, as a keyboard it has only two keys. Here is the file in part (skipping aspects not relevant to this test):
1 | <keyboard locale="mt" conformsTo="techpreview"> |
Let’s dive in here.
The keyboard conforms to a certain CLDR version. We’re still in unreleased territory, so for now the version is techpreview
.
The “key bag” has two named keys.
hmaqtugha
and is the Maltese ħ, known as “H Maqtugħa” 1 that is “cut H” as opposed to the ordinary H (akka). By the way, the character ħ
is Unicode U+0127, which is decimal 295. And now you know the origin of that number in my username.that
because it is the word meaning “that” in Khmer (Cambodian), ថា.There is a single hardware layer, with a single row. That row is the one which in a US Keyboard begins with backquote, 1, 2, 3, etc.
ħ
if the backquote key is pressed, and to see ថា
if the number 1 is pressed.I used the in-development Keyman compiler tool, kmc
, which turned the above XML into a small basic.kmx
file. The tool is written in TypeScript and so is easy to run from any command line via Node.js.
Next, I hand-built a .kps
file, or rather, copied-and-pasted an existing one to suit my needs. This is a Keyman "package source", basically a manifest of which files will end up on the user’s desk. The most exciting part of this xml file is reproduced below:
1 | … |
<ID>
must match the .kmx
file. In any event, these files were packaged into a basic_ldml.kmp
file, which, like a .jar
and many other such packages, is really a zipfile in disguise.Now I have a Keyman packaged keyboard, just like any of the thousands of other Keyman-format keyboards in the world.
Well, sort of. We actually need a Keyman engine and core which knows how to deal with this new format keyboard. The LDML format isn't compiled into the existing Keyman binary format, but it is in fact a new variant of the format.
At the moment, compiling the engine and core for Linux, specifically for a separate VM, seemed to be the easiest path to use the new keyboard. Of course, I expect that someday all copies of Keyman will include this support.
I chose an Ubuntu 22.04 VM and was able to compile Keyman without much trouble. Keyman for Linux has a Python UI for its configuration, and hooks into the ibus input framework.
Installing was easy, I just clicked Install in the km-config
UI and chose the .kmp
file.
Once installed, I could select the new keyboard from the system menu.
Now we’re ready to actually type in gedit
!
It’s hard to say a lot with just these two characters. But it is a start.
Let’s now try to work with a real keyboard, specifically MSA 100:2002 available from MCCAA. The hardware here is a Sirap K366P.
In the keyboard-preview
branch of CLDR, the mt.xml
file is available as an example file. It reads in part:
1 | <keys> |
I used an in-progress pull request to flatten the 'import' statement out, as I have not implemented that in kmc
yet, and also pulled in the 'implied' keys such as:
1 | <key id="A" to="A" /> |
The exact file I compiled for this is here if you wish to see it. It had to be slightly edited due a couple of unimplemented features.
And it works also! 2 Roughly, the above says “Health… Good Morning” which is, all things considered, not a bad way to end this year’s blog posts.
We’re at the end of the year, when in the Gregorian calendar we remember and celebrate Christ’s incarnation—Christmas! So this post will serve as a bit of an update.
I organized my consulting work as Code Hive Tx, LLC
, with its own site https://codehivetx.us. I’m not pivoting into apiary work, although that is a potential future hobby. Instead, this is a hive for code: software development. It's a Texas Code Hive, headquartered in Dripping Springs, TX, USA where I have shared office space. Code Hive began its first consulting services January 2022.
As mentioned in the srl.next post, I continue to work on Unicode’s Common Locale Data Repository (CLDR). A lot has happened, both in terms of project and data growth, but also in terms of process and software modernization. CLDR has had an emphasis on paying down technical debt, and much progress has been made. The crowdsourced linguistic voting platform has gone from a Java servlet-based structure to a modernized J2EE application, including OpenAPI Spec 3.0 (Swagger) REST documentation. The front end now has Vue3 as its core, largely replacing many home grown frameworks. Ansible was used to automate VM provisioning, with the result that an additional staging server was recently added, with almost no time spent in server configuration before it was ready to be productive.
I’ve also picked up a local client in the fintech space, focussed around distributed computing and data science needs in python. I’ve brought in improvements to the automated build system as well as deploying custom Prometheus collectors and dashboards to make sure everything is working the way it ought to be. Actually, Prometheus figures somewhere into most of my projects, including telling me whether the home printer is accidentally powered off or not!
Additionally, I’ve been involved in the CLDR Keyboard subcommittee for some time. As of this writing, there isn’t a great landing page with current status on this activity, although it’s on the TODO list. You can read the current landing page here.
In summary, the work of this subcommittee is to bring the UTS #35 Part 7 from its current state as merely describing keyboards, to become the standard industry-wide for implementing all keyboards. That is both ambitious, and also needs to be justified as a goal. I’m going to attempt to do both in a future post. Very briefly, though, currently keyboard development is completely platform-specific and so keyboard authors must convince the respective organizations and communities to independently develop for their language on Android, iOS, macOS, Linux, Windows, just to name a few.
What I did want to mention here is that many people have been volunteering their time, or been able to spend time on behalf of their respective organizations, in order to attend these subcommittee meetings over the past couple of years. However, that has not itself resulted in rapid development of the specification.
To that end, Code Hive Tx, LLC has received a Unicode Adopt-a-Character grant in order to progress the spec and fund participation in the meeting. As a result, I’ve been able to produce a draft specification with many improvements, as well as spend time on sample data, actual DTDs, test code, and so on. This enables the other team members to make comments on this spec work and has resulted in much progress.
But a spec without an implementation is, as they say, just a spec. That’s where Keyman comes in.
In addition to the spec work above, Code Hive Tx, LLC also has a contract with SIL International to add a production implementation of the CLDR Keyboard spec and associated tools to Keyman, SIL’s widely used open-source keyboard platform.
I recently returned from a week-long planning meeting in Siem Reap, Cambodia with the Keyman team. Besides an interesting trip and location itself, it was great to discuss in-person the future of keyboarding.
As mentioned above, the spec work and the implementation work go hand-in-hand, giving leverage to the prospect of a major sea change in ease of keyboard implementation. As of last week, we have the first actual keystrokes processed using a prototype CLDR keyboard. But that’s for another post: “Keyboard Progress”
Merry Christmas and Happy New Year from Code Hive Tx, LLC!
]]>A couple of useful
git
commands. (Meaning: writing this down so that I don’t forget it!)
As before, these two are going to be given in the form of custom aliases. You can add the following to your ~/.gitconfig
file or update the [alias]
section of it exists.
1 | [alias] |
skip
/ unskip
I will cut right to the chase here: I put these in because of Eclipse .settings/
files.
I work on several Java+Maven projects that have historically used the Eclipse IDE. For most of them, Maven is now the source of truth. However, there are a number of settings such as tabs vs. spaces and indent style, which ought to be shared. How do we know which files should be checked in and which shouldn’t? There does not seem to be a good way to tell.
As I noted in this comment on CLDR-15048, there are many files, as well as .classpath
, change quite often, seemingly due to:
1 | git status |
Note that some of these files don't seem to be automatically re-created by Eclipse when importing. So simply .gitignore
ing some of the files doesn't work. We could ignore the entire .settings
directory, but then each user could have potentially differing editor and other preferences.
Here is how I use git skip
in practice. There are changes, but I want git to forget-about-them.
1 | git status |
Let’s rebase this particular branch:
1 | git rebase upstream/main |
Oh no! Someone actually changed these files. OK, no problem, this happens all the time (I say to myself). I’ll just revert these changed files.
1 | git checkout -- tools/cldr-apps/.settings/org.eclipse.jdt.core.prefs tools/cldr-code/.settings/org.eclipse.jdt.core.prefs tools/cldr-rdf/.settings/org.eclipse.jdt.core.prefs |
Working as designed. Git doesn’t know anything about these files… 🙀
unskip
This is why there is an unskip
command. We need to tell git to pay attention to these files again.
1 | git unskip tools/cldr-apps/.settings/org.eclipse.jdt.core.prefs tools/cldr-code/.settings/org.eclipse.jdt.core.prefs tools/cldr-rdf/.settings/org.eclipse.jdt.core.prefs |
OK, now the rebase can succeed!
1 | git rebase upstream/main |
Oops (this is a live demo!) well, it suceeded enough for me to deal with the substantive merge conflicts.
Of course, I can git skip
files again when I want to.
git update-index --skip-worktree
and --unskip-worktree
work on a single worktree only. So if you have several worktrees checked out, you can skip and unskip them individually. This skippage also is never propagated by a push
or fetch
.
We had a problem to solve. Some CLDR processes, such as comparing with past versions, needed to see ALL past releases of the project, checked out side–by–side. This was previously done by literally unzipping each release into a special folder. One idea to automate this was to have an automatic unzipper, or to check in all of the versions into yet another repository.
Instead, I wrote a Java tool which automates the use of git worktree
.
1 | org.unicode.cldr.tool.CheckoutArchive |
Since all version history is already in cldr/.git
, it makes sense to leverage that into these 38 worktrees. Each worktree is checked out as a detached HEAD according to the release tag that produced it. This makes it very convenient to run tests and comparisons against the old versions.
Furthermore, unlike with an unzip, the full version history is available so git log
, git blame
, diffs and advanced searching is available to use.
Enjoy this git twofer-plus, and as always
git reflog
A couple of useful
git
commands. (Meaning: writing this down so that I don’t forget it!)
Of course, I did end up forgetting even that I had posted it to the blog. Oh well.
These two are going to be given in the form of custom aliases. You can add the following to your ~/.gitconfig
file or update the [alias]
section of it exists.
1 | [alias] |
What’s inside? That’s always something I ask myself when contemplating a grocery store purchase. I find myself asking the same question when I’m about to rattle off git commit ; git push -f HEAD:prod
and deploy some hopefully-tested code—or, at least send it off for PR review.
git diff-tree --no-commit-id --name-only -r
fits the bill… try it out:
1 | git ingredients HEAD # What's on the latest commit that I'm about to push? |
git ingredients
shows you a simple list of all of the files changed in the specific commit.
“…Just one more thing.” Before Job’s famous keynote-ender, there was Peter Falk.
Anyway, sometimes you think you’re done with a commit… but there’s just one more thing. A test you forgot to run. Whitespace. Rewrite the whole thing—it’s up to you.
git commit amend
takes whatever is staged for commit, and merges it into the HEAD commit. Type it often enough, and an alias is in order. And just one more thing—if you just use amend
, you will end up retaining the original date and time. Me, I like to bump the date to the time I last touched it.
git columbo
will commit the currently staged files (if any), and give you the opportunity to edit the commit message.
git columbo -m 'fix all the things'
lets you specify that message from the command line
git columbo somefile.p
will only commit somefile.p
among all the available files
git columbo -a
One more thing: All the things! Commit everything you’ve changed.
I also run git columbo -S
sometimes if I decide I want to sign the commit.
Credit here goes to the tweet/thread below. I added the --date=now
part.
Fast forward to 1993. I knew that Taligent (also where my father was at that time) was working on a new OS of some kind, but that’s all I knew. (Pink). I just wanted to write some games for it*, and that needed an NDA. Instead of an NDA, I got a job: an internship, using my bran-new C++ skills.
Trying to do pre-build integration on an amazingly complex subsystem which somehow meshed with a dozen other subsystems was an interesting challenge, to say the least! (There were multi core builds all right—one keyboard and ADB mouse per core.)
I learned a lot about how to, and sometimes how not to, design OO systems, rubbing shoulders (and trading bug reports) with the best in the industry. Learned a lot, launched a lot of Nerf™ product, accidentally burned an OID (sorry)! … and, shipped some good features and even products. I also received plenty of constructive criticism.
I think my first job title was "Technical Specialist", probably because some form needed a title. Since the Apple tradition of design-your-own-businesscard was followed, I styled this as “TechnoSpecialist.” Later I graduated to Software Engineer or something, so I wrote “Code Sculptor.” I had in mind the idea of those who brought a certain craft to the field—Wozniak, Hertzfield, Atkinson, and so on.
I also ran into something called Unicode. I discovered it quite by accident. I still was trying to make some kind of game, but I decided (since I was in the NetComm group) to write a networked chat program. Sounds simple enough. The base text class was called TStandardText. To my surprise and annoyance, when I streamed a TStandardText (using operator>>=
of course!), the other side received a bunch of NULLs (\0x00
). A null, then a letter, a null, then another letter. And finally, TWO nulls to end it. For example, streaming ABC
showed up on the wire not as ABC but as
1 | \x00 \x41 \x00 \x42 \x00 \x43 \x00 \x00 |
When I read this into a char*
on the server side, it had strlen()
of 0
. Huh? I plugged in a network analyzer (Network General Sniffer 10Base2), this being the NetComm lab, and it was confirmed. What happened to my string? Answer: Unicode. UCS-2 BE, to be precise.
I was not impressed. What was this Unicode® and why was it scrambling my text?? So, I just changed my chat program to ++ over every other null. Problem solved.
However, my attention was focused on networking and communications, protocol abstractions for streams and RPC, Object storage and network discovery (tcp port 6149, tal-pod
).
I went on to work with email protocols, and then web services. I had run a very early departmental web server, and had done some prototyping. Around the time (1996-1997-1998) that Taligent was being folded into IBM as a wholly owned subsidiary, this turned into WebRunner ServerWorks and US 6,233,622 B1.
Also around that time, I took my first trip to Malta. Just before I went, I heard there was need to help on a “temporary assignment” to do some language related stuff. But it would be done by the time I got back from Malta, or so it was thought.
The work wasn't done, so I jumped in when i came back: A small team working on Bidi. Arabic and Hebrew text enablement. Somehow I understood a little more about Unicode by that time, and we got actually working worked.
If you haven't ever tried to implement Bidi text selection and rendering, try it. You'll either run away screaming, or… find a new favorite.
I had found my field. This “temporary” assignment (which wasn’t in itself as temporary as it was supposed to be) turned into a change of department at my request—a career change.
1997 also brought Java 1.1, including Taligent-contributed Unicode technology. I wasn't any part of that. (In 1996 I remembering handing out flyers for some “Taligent Analytics for Unicode”, Networld+interop NYC, but I'm sure I couldn't explain what it was for.)
However, soon after, I was helping out with what was released, in 1999 as the open source International Components for Unicode, partly based on the Java and C++ work, and available in C++ and Java.
Now, all of this global stuff needs some local data. The data directory of ICU kept getting bigger, and the bugs kept coming in. The data for ICU (and the JDK, and 100 Linux Locales) had all come from the ICIR data, from IBM Toronto. In 2003 a project was started, first under l18nux/LSB/FSG (what you now know and love as the Linux Foundation called CLDR.
It's an oversimplification, but from a pragmatic perspective, we took the ICU data subdirectory as a seed, split it off, put it in XML, and put a lot of process and tooling so that it could be easily compared with what all the other non-CLDR platforms were doing. And the comparisons showed that localization was all over the map… so to speak.
Anyway, a lot happened with ICU and CLDR over the years. I wrote the original Survey Tool to collect data for it, which today looks and acts nothing like my original version (and that's a very good thing).
I also eventually got involved with translation, first by joining the XLIFF-TC (when I didn't actually do any translation nor use XLIFF), and only years later (in 2015) by being part of the team bringing you the Globalization Pipeline. ** on Bluemix (now IBM Cloud).
Learning about translation, actually using XLIFF, has been great. But, I found less and less time to work on core globalization. I had a short stint trying out the Project Management role, and also worked to bring OpenWhisk adoption up to speed in our group.
I mentioned being a 2nd generation IBMer. Not only that, I’m a second generation occupant of the same tower at the San José site that my father was in (not concurrently).
~23 years is a long time. One of our kids had said years back, “I want to help you work on Bluemix someday!” Well, Bluemix is now the IBM Cloud (go check it out!). That would be a third generation. I’m still proud to be an IBMer. However, come this Tuesday—right on the schedule that was presented to me—I won't be.
So what's next? Starting that same Tuesday, I'm going the independent contractor route. And my first contract is a short term one with Unicode itself, to work on CLDR!
So yes, I suppose the tweet above may count as a subtweet. CLDR is extremely important to how humans use computers intheir very own language and culture today, and I am excited to jump back in. What has become a “spare time project” will become literally my front burner work.
What's next?
Endnotes:
Work in progress of course.
]]>git
commands. (Meaning: writing this down so that I don’t forget it!)I seem to have added a lot of remotes to some of my repos. Remotes are helpful for checking on the progressof different branches people are working on. But what if I want to git fetch --all
but not really fetchall remotes? There has to be.. a better way.
1 | $ git config remotes.dev "srl295 upstream" |
Aha. Now I can do:
1 | $ git fetch -p dev |
And only the two “dev” remotes (srl295
, my work, and upstream
, the upstream fork) get updated.
I sometimes have several icu branches going at once. A git worktree allowsme to work on several branches without having to have unrelated (several gig of metadata) directories.
1 | $ cd ~/src/icu |
Now ~/src/icu-64.2
contains a separate branch (or tag or whatever), but the objects (and LFS!) areall shared. When I’m done with it,
1 | $ cd ~/src/icu |
If I want to see what worktrees I have open, I can do this:
1 | $ git worktree list |
You can run the worktree
commands from any of the worktrees. It’s like a happy git family ofworktrees!
Credit to Myles here:
]]>What I want to cover in this post is the actual migration process- see the ICU site for specifics about how to use the ICU repository and bug system. Note2 Here is a link to Unicode’s official blog post.
Note 1: In the first edition of this post, I didn't make a couple of things clear enough:
Teamwork — I did not accomplish all of the steps below alone. Thanks to all of the ICU-TC colleagues for helping with review and engineering tasks (that are still ongoing as I write this).
I’m not done with ICU — I remain a member of ICU-TC and I hope to actually contribute something again, now that my time isn’t spent “keeping the lights on.”
The major two apects that needed migration were:
Notice a repeated key word above: hosted.
Hosted mean that this role goes away. This is a continuation of a trend started a few years ago when I recycled 1,500+ pounds (680+ kg) of server equipment that used to be the ICU continuous build farm.
Subversion to git may not sound like it should be particularly difficult, using subgit (thanks for the OSS license!) and others. However there are a number of mitigating factors.
If you have ever set up your own Subversion repo, you will be familiar with the top level trunk/branches/tags structure. You may also be aware that in svn (as is the UNIX way) “everything is a directory.” ICU had started with separate projects for icu4c and icu4j like so:
1 | icu/ |
At some point in 2016 we decided that it was a good idea (and it was) to merge the trees. ICU for C and J are really developed together, and there is important interlock between the two regarding generated data files.
1 | trunk/ |
So far so good, but this point of discontinuity confuses the standard migration tools.
Mistakes happen. But, this one means it looks like all source was deleted and replaced.
Each .jar file ins't very big by itself. But ICU4J has a binary copy of its data file checked in. But there are thousands of copies of the icudata and other jars in the svn history. When all the dust settled, we probably end up with 2.3G of git lfs in 600 objects.
The trac to JIRA importer was not available to us (not available in JIRA cloud anymore). CSV import seemed very unwieldy, as we needed to be able to incrementally update the issues a we were developing the mapping. Plus, our trac instance has many customizations, with source patches (yes, contributed back where they made sense) and a custom plugin that powered our workflow.
By the end of 2Q2018, let's call it 2018-06-30T23:59:59.999Z, my infrastructure involvement in ICU needs to go to zero. This means no root logins…
Note, I'm only talking about infrastructure, not other project involvement.
Subgit works quite well. It takes some time, but it is worth it for a configurable conversion. However, it would not handle the discontinuities mentioned above.
I knew that Subversion has a dump format. Perhaps it would be possible to manipulate the dump, to make it look like ICU had always had a 'merged tree', and then import from there? ICU’s dumpfile is about 20 Gb.
I found some stack overflow questions that didn't quite match what I needed. I ran across SVN::DumpReloc in CPAN, and noted it for future reference. It didn’t work out of the box.
The challenge is that the svn dump is just a simple dump of the internal binary deltas. It does not take well to mkdir or copies with no intermediate dirs. So, simply renaming /icu/trunk/source/common/uloc.cpp
to /trunk/icu4c/source/common/uloc.cpp
in old revisions won’t work, because /trunk
didn't exist until 2016.
As usual, I reached for npm init -q -y
and started off to write a processor for the svn dump. I learned how to implement a Duplex stream, and got a little ways but definitely not far enough:
I dusted off my perl pocket reference and even-dustier perl skills and set out to update SVN::DumpReloc. Unlike my js code, the perl actually worked. And working is good here.
I ended up adding a simple [JSON configuration structure(https://github.com/unicode-org/icu-remunge-svndump/blob/master/icureloc.json) that would do three things:
/trunk
, /branches
, /tags
in revision r1
where we should have created them.add
to change
(no-op property change).1 | { |
In the end, it worked. A few bugs remained: branches and tags pre/post merge aren't quite where we want them. But the bulk of the svn history is kept.
Given the above restrictions, I created a new node.js tool, https://github.com/unicode-org/icu-trac2jira to migrate a trac .sqlite3 dump to a JIRA database— by using the REST API. With a minimum of configuration it is able to map all of the fields, wiki syntax, and attachments needed to preserve our issue history. It's not perfect, and there's work to be done to fix some of the values, but I think it got the job done as far as initial migration.
The interesting thing, process-wise, is that I ended up with something that could run incrementally to update JIRA to match trac. So as there was feedback on errors in the wiki syntax conversion, I could re-run the tool over a subset of the tickets and it would either update a ticket, comment, etc. or cause no change depending on whether JIRA matched the expected results.
A separate script created 20,000 empty tickets in a block, before running this converter. This allowed us to keep the same ticket IDs between trac and jira.
http://icu-project.org was already the third external web host for ICU, after jtcsv.com (2001 mirror) and oss.software.ibm.com/icu (2004 mirror).
In 2006 I migrated ICU from cvs and JitterBug to svn and trac. So yes, we've done this before!
JitterBug (which I also customized extensively and added new report CGIs to) had a very simple hierarchical file structure which was very hackable. Since trac used a sqlite database, I wrote source to read this file structure and emit SQL to recreate the bugs in the new form.
1 | // jb2svn.c |
An oddity of that conversion is that I sort of punted on converting the date fields at all. Maybe there either wasn't a ticket-creation time, or the files had all been re-touch
ed at some point. Or maybe it was just… laziness. Or whatever the other two are (I'd have to look it up).
Of course, our conversion process faithfully preserves this history. I think 1970-01-01T00:00:28.000Z
is due to wanting a unique timestamp for some reason, thus (epoch time + 1 second per bug)-ish?
require('mbta')
Chorus
Did he ever resolve,
No he never resolved
and his fate is still unsolv’d
He’s been <pending>
forever
on that un-called closure
He’s the Promise that never resolved
Charlie yielded his time
somewhere down in libuv
and he queued in
th’event loop main.
When he got called the constructor told him
“one more bracket”
Charlie couldn’t reach then()
again
“Only one more build,”
said the rock star coder
“I’ll replace four lines with three”
But, alas, git merge
of the branch to master
And the Promise sailed out to sea!
Oh the DevOps ninja
Reboots the build in production
Every day at quarter past two
And through the open window
Charlie’s newlines are sandwiched
As the log goes rumbling through!
Lyrics: Steven R. Loomis 2017, Parody of: “M.T.A.” words by Jacqueline Steiner, Bess Lomax-Hawes (1949) which is itself based on “The Ship That Never Returned” by author and composer Henry Clay Work (1865).
]]>Edit 2018-07-10 I have added some further references at the end.
Introduction
There are a lot of steps to be taken in order to ensure that a language is fully supported. The objective of this document is to collect the steps needed and begin to plan how to accomplish them for particular languages. The intent is for this to serve as a guide to language community members and other interested parties in how to improve the support for a particular language.
Metrics
The diagram below shows languages in one axis, and the “stack” of support tasks on the other.
Coordination is key. Finding and communicating with the right people is often at least as difficult as the technical aspects. ScriptSource can be a good “central hub” to collect/publish information and needs for a user community.
Encoding
A critical step is of course Unicode encoding, but that is only the first step. Also, there can be (through no fault of anyone’s) a long gap between the first contact with a user community and the publication of a Unicode version supporting that language, not to mention other steps. The Script Encoding Initiative at UC Berkeley works closely with language communities working to encode their scripts in Unicode.
In the course of the encoding process, a lot of information is gathered which is relevant to other steps such as grammatical considerations and best practices around font and layout support.
Font
From Martin Raymond:
One recommendation is to split the drawing of the glyphs from the more technical aspects of font design. Someone familiar with the writing can draw the letter shapes and pass them on to a font designer to develop the font.
In other words, the critical initial step is to get the correct glyphs from the user community.
Note that there is a need for fonts for different purposes: aesthetic, low resolution, small devices.
Layout
Determine if layout requirements are “complex” or not. (See the “shaping required” field of CLDR Script Metadata).
Support through W3C’s Layout & Typography project: https://www.w3.org/International/layout
The text-rendering tests can be useful to determine if OpenType font rendering is correct.
OS-level support
Input
Locale Data
CLDR seed/initial data (see CLDR Minimal Data)
Software Translation
Advanced NLP (Natural Language Processing)
The development of many NLP applications requires large digital corpora, the collection of which is a project in itself. Even when corpora are collected, say through web crawling, when they are not available publicly, other developers cannot benefit from them as a resource. Therefore, a freely available repository of digital resources in a target language, to which contributors can add, is an ideal first step for the following efforts.
App-Level Support
This means going beyond:
…to truly supporting language specific features. Some examples:
ICANN / IDN support
Support for a script within top-level domains allows an important level of localization online that breaks from the historically Latin-only top level domains and reflects the truly international nature of the Internet. ICANN has made significant progress in this area, and is currently in the process of working with language communities to define rules for using many new scripts in TLDs (top level domains).
Computer programming language in mother tongue
While this may seem a far-fetched dream today, the fact that programming languages are in English is a barrier to the full use of digital tools by much of the world’s population. This might be the final frontier for the internationalization/localization of digital technologies. “قلب” is an example of a programming language entirely in Arabic.
References
]]>Today I’m pleased to release pino-couch. You can find it on GitHub under https://github.com/IBM/pino-couch.
This little module is a transport which lets you capture your pino logs into any CouchDB database.
Speed: I haven’t independently tested the benchmarks, but I really like logging that doesn’t slow down the application. I want to be able to sprinkle logging generously in the application without slowing it down.
Simplicity: Take a look at the example below. We go from logging to the console, to logging in a database. The configuration and execution of log processing is entirely outside of the application.
Sticker: Because it has a logo that looks nice on a hex sticker? OK, not really. But @matteocollina presented this logger so effectively at NodeSummit, I asked for a sticker. Today, I’m glad to give something back to the community.
Let’s do a quick demo here, with a simple app that emits some logs:
1 | $ cd somewhere |
And for index.js
:
1 | const pino = require('pino')(); |
With the nice pino API you have lots of options for emitting logs.
1 | $ node index.js |
Notice the trace()
details were below the current level, so were omitted. This is detailed, but not super readable. If you are running something from the commadn line, the pino
global utility tidies up the output nicely—in color, even, if your console supports it.
1 | $ npm install -g pino |
Here’s where pino-couch
comes in. I’m going to set up a https://cloudant.com databaseto store these logs (as I do in production), but you can also use a local or any other couchdb
instance (as I do when developing locally).
pino-couch
only needs to write to the database, it doesn’t need to read. Click the Permissions tab, then Generate API Key. Choose only the _writer
column for our new API key.
That’s actually it for configuration.
pino-couch
. Use the APIKEY and PASSWORD that were generated above. And of course, your own ACCOUNT.1 | $ npm install -g pino-couch |
| pino
at the end to keep the output human-readable— that's optional.Let’s take a look at the Cloudant dashboard again:
There’s our data!
Here are a couple of things you might do with your new logging pipeline:
Even something as simple as the following will get you timestamp-ordered documents.
1 | function (doc) { |
Note that besides the time
field with epoch time, hostname
contains the current hostname. This is really useful for distinguishing logs from among a cluster of servers.
We’ve done this with great success. We were already pulling from another Cloudant DB, so it was easy to add the application logs.
And of course:
This post (and video) will explain how to translate Kibana using the Globalization Pipeline service on Bluemix. Note that some of the steps shown here depend on kibana:8766 which was not merged as this article went to press. (Portions are based on the development-internationalization.asciidoc
document from that PR.)
Kibana — for now, I am using the i18n_phase2 branch from kibana:8766. - 91f27f69a03eb74f4a84d2f628b8f5584b9d2a70 to be precise. See Kibana’s READMEs for detailed setup instruction.
A Bluemix account to access Globalization Pipeline. It's free to sign up!
Java and the latest gp-cli.jar (Globalization Pipeline tools).
gp-credentials.json
which should look something like the following:1 | { |
en
) as the source langage and requests Spanish, Japanese, and French targets (es,ja,fr
).1 | $ java -jar {wherever}/gp-cli.jar create-bundle -j {wherever}/gp-credentials.json -b 'kibana_core' -l en,es,ja,fr |
The bundle will show up in the Bluemix dashboard under the service’s console, but as empty.
We are going to translate the src/core_plugins/kibana/translations/en.json
file in Kibana. Upload that file to the Globalization Pipeline service using the command line:
1 | $ cd ~/src/kibana |
What you see was done with machine translation, hence the red “U” (Unreviewed). The content here can be corrected manually by clicking the Pencil icon, or marked as manually reviewed by clicking the Checkmark. It’s also possible to download the translated content for offline review or use, or to upload a corrected version of one of the translations.
Head back over to the command line, though, because it is time to create our plugin.
kibana-YOURNAME-translation-plugin
next to your kibana
directory.Something like this:
1 | $ npm install -g yo generator-kibana-plugin |
translations/es.json
file. We will replace this with our translated content.1 | $ rm translations/es.json |
1 | $ java -jar {wherever}/gp-cli.jar export -j {wherever}/gp-credentials.json -b 'kibana_core' -t json -l es -f translations/es.json |
index.js
file in the plugin to mention the updated translations
files.You will see a section like this:
1 | translations: [ |
Change it to mention all of the language files we have just downloaded:
1 | translations: [ |
That's all the coding we'll need for today…
Copy your entire translations plugin directory to the Kibana plugins (<kibana_root>/plugins/
) directory
Fire up Kibana and you should see the translated content!
By the way, French isn’t included in the video or images becuase I ran into kibana#10580 during the production of this video. When this is fixed I will come back and edit this video, but until then, beware single quotes ('
) in your translated strings.
Note that if you repeat the import
and export
steps of the gp-cli
tool, the Globalization Pipeline will automatically manage translation changes if, for example, translated keys are added or removed, or translated content changes.
Follow the progress of Kibana Globalization on Github: (kibana#6515).
Read more about Globalization Pipeline
Connect with the Globalization Pipeline Open Source Community
On the serious side, we need to emphasize communication skills in the technology industry. Even if I have a great idea, if I can’t communicate it, it will go nowhere. And neither will I.
Just to be clear, by “communication” I mean “talking with other humans”. Which brings me to today’s topic on the lighter side, and that is the overloading of English.Words such as function, overload, network, build all have specific meanings that weren’t originally found in Webster’s.The 1828 definition of computer, for example, is:
One who computes; a reckoner; a calculator.
In i18n, there are other words that have very specific meanings: global, globalization, collation, contraction, and of course locale, just to name a few.
To that end, I have started to add some tongue-in-cheek “redefinitions” to the bottom of the blog just to remind us all that these words have non-software meanings.
If you want to see them all without hitting reload an infinite number of times, you can see the original source here.
Speaking of i18n, this overloading doesn’t apply to English only. Most of my devices are set to es-US
as their locale, so I see a lot of translated error message. gcc
for example has a thriving translation project where dedicated persons cause “English” to be translated into, for example, “Spanish” such as:
#~ msgid "function ‘%D’ declared overloaded, but no definitions appear with which to resolve it?!?"
#~ msgstr "¿!¿se declaró la función ‘%D’ sobrecargada, pero no aparece ninguna definición con la cual resolverlo?!?"
Not sure why that’s ¿!¿
where I might expect ¿¡¿
— perhaps the initial !
just shows the compiler’s incredulity. In any event, sobrecargada seems to be a great cognate for overloaded. And with that, I will let you goto whatever you were doing before you started reading.
PR’s are welcome on my little list, or leave comments below. What are your favorite examples of overloaded terms, in any language?
]]>One of many great features in ICU is the callback support. A lot can go wrong during codepage conversion, but in ICU, you can control what happens during exceptional situations.
Let’s try a simple sample. By the way, see the end of this post for hints on compiling the samples.
Our task is to convert black-bird
(but with a U+00AD
, “Soft Hyphen” in between the two words) to ASCII.
1 |
|
Output:
1 | Input String: blackbird length 10 |
Hm. Ten characters in, nine out. What happened? Well, U+00AD
is not a part of ASCII. ASCII is a seven bit encoding, thus only maps code points \x00
through \x7F
inclusively. Furthermore, U+00AD
is Default Ignorable, and as of ICU 54.1 (2014) in #10551, the soft hyphen can just be dropped.
But what if, for some reason, you don’t want the soft hyphen dropped? The pre ICU 54.1 behavior can be brought back easily with a custom call back. So, roll up your collective sleeves, and:
1 | // © 2016 and later: Unicode, Inc. and others. |
If we #include
this little header, and set it on the converter before we convert…
1 | LocalUConverterPointer cnv(ucnv_open("us-ascii", &status)); |
… we get the following result:
1 | Input String: blackbird length 10 |
Great! Now, we are getting \x1A
(ASCII SUB). It works.
A related question to the above has to do with converting from codepage to Unicode. That’s a better direction anyway. Convert to Unicode and stay there! One can hope. In any event…
For this task, we will convert 0x61, 0x80, 0x94, 0x4c, 0xea, 0xe5
from Shift-JIS to Unicode.
1 |
|
Output:
1 | Input Bytes: length 6 |
So, the letter "a" byte \x61
turned into U+0061
, and then we have an illegal byte \x80
which turned into U+001A
. Next, the valid sequence \x94 \x4c
turns into U+732B which is 猫 (“cat”). Finally, the unmapped sequence \xea \xe5
turns into U+FFFD
. Notice that the single byte illegal sequence turned into (SUB, U+001A), but the two byte sequence turned into U+FFFD
. This is discussed somewhat here.
So far so good?
But what if you actually want U+FFFD as the substitution character for both sequences? This would be unexpected, but perhaps you have code that is particularly looking for U+FFFD
s. We can write a similar callback:
1 | // © 2016 and later: Unicode, Inc. and others. |
Let’s hook it up, as before:
1 | LocalUConverterPointer cnv(ucnv_open("shift-jis", &status)); |
And drumroll please…
1 | Input Bytes: length 6 |
Garbage out never looked so good…
To build these little snippets, I recommend the shell script icurun
If ICU is already installed in your appropriate paths, (visible to pkg-config
or at least icu-config
), you can simply run:
1 | icurun some-great-app.cpp |
… and icurun will compile and run a one-off.
If, however, you’ve built ICU yourself in some directory, you can instead use:
1 | icurun -i path/to/your/icu some-great-app.cpp |
… where path/to/you/icu
is the full path to an ICU build or install directory.
If you are on windows… well, there isn’t a powershell version yet. Contributions welcome!
]]>First, I’ll launch XCode 8 and create a new workspace.
While that is launching, I’ll warn you that your author is only a recent graduate of the Swift playground, who once deployed some toy apps to a then-new iPhone 3GS. So, it’s been a while. Any suggestions for improvement are welcome. The actual SDK, however, was a team effort.
Today’s app will be a color mixer, to help artists mix their colors. You know, red and blue makes purple, and so on.
I will name the workspace gp-ios-color-mixer
, and create a new single view app called GP Color Mixer
. To simplify things, for now, I disable the checkbox “automatically manage signing.”
I want to include the new SDK. I’ll use Carthage to install it. Since I already have Homebrew installed, I only need to do
$ brew install carthage
Now I need a Cartfile
that mentions the SDK. So I create one at the same level as my XCode project, containing:
github "IBM-Bluemix/gp-ios-client"
Following the Carthage instructions, I next run
$ carthage update
which results in
*** Fetching gp-ios-client*** Checking out gp-ios-client at "v1.0"*** xcodebuild output can be found in /var/folders/j9/yn_32djn36x4d4c2mvcr1kgm0000gn/T/carthage-xcodebuild.p2nKN2.log*** Building scheme "GPSDK" in TestFramework.xcworkspace
So far so good. Looking in the Finder, I now have GPSDK.framework
right where I expect.
I’ll add it under “Linked frameworks and Libraries”.
We also need to make sure the framework is available at runtime. To do that, we add a build phase with a one-line script: /usr/local/bin/carthage copy-frameworks
with a single input file - $(SRCROOT)/Carthage/Build/iOS/GPSDK.framework
Will it build? I add this to the top of my generated ViewController.swift
:
1 | import GPSDK |
I mentioned turning off code signing, but I still ran into some odd warnings:
1 | A shell task (/usr/bin/xcrun codesign --force --sign - --preserve-metadata=identifier,entitlements "/Users/srl/Library/Developer/Xcode/DerivedData/gp-ios-color-mixer-evyxcmilwuakdmdvxqqpmmnzisnn/Build/Products/Debug-iphonesimulator/GP Color Mixer.app/Frameworks/GPSDK.framework") failed with exit code 1: |
Following QA1940 I was able to make some progress by running xattr -cr './Carthage/Build/iOS/GPSDK.framework'
. Now, ⌘R Run rewards me with a blank app window and no errors. Let’s write some code!
By code, of course, I mean a trip to the storyboard. Let's add a launch icon, because we can.
Now, I add some static fields, two picker views (for the input colors), and a button for action.
I wrote Color.swift
to handle the color mixing. It will only support mixing from three of the primary colors - Red, Yellow, Blue. Any other mixing turns into muddy brown. Playground tested, ready to go.
1 | enum Color : Int { |
Time to wire it up. We create IBOutlets
for each of the items. And, I’ll clear the result label just to verify that things are wired up. It runs OK, good.
Now, let’s set up the delegate stuff so that we can get the list of colors showing.
1 |
|
Hey, just a little more code and we’re feature complete!
1 | func doMix(_ sender: Any) { |
At least, feature complete in English.
I’ll next take stock of the resource strings we need to have translated, so that we can run them through the Globalization Pipeline. I’ll call this gp-color-mixer.json
1 | { |
Time to fire up Bluemix. We are going to basically follow the Globalization Pipeline Quick Start Guide for the this part, which I will refer to.
First, I create an instance of the Globalization Pipeline. The name you give the instance doesn’t matter here.
Now I create a bundle named gp-color-mixer
. This name does matter, as our iOS app will use it to access the content.
I’ll Upload the gp-color-mixer.json
file above as the source English content, choosing JSON format for the upload. I pick a few languages for the target.
If I view the bundle, I can see our strings there, as well as translated versions.
The Globalization Pipeline offers this web UI to manage content, as well as powerful REST APIs for managing the translation workflow. I need to grant access to the iOS app so that it can read but not modify the translations. So, switching over to the API Users tab…
The result of creating the API user is that some access information is shown, something like the following:
API User ID: 5726d656c6f6e7761746572Password: aHVudGVyNDIKInstance ID: 77617465726d656c6f6e77617465726dURL: https://something.something.bluemix.net/something/something
I take these and plug them into a new swift file named ReaderCredentials.swift
like so: (this is a variant of ReaderCredentials-SAMPLE.swift in the SDK’s repo)
1 | struct ReaderCredentials { |
(Now, after putting my actual credentials in, and a brief offscreen struggle with .gitignore
, I move on…)
I’m almost done.
First, in the ViewController.swift
, we initialize the GP service and start setting up a few UI items:
1 | let gp = GPService() |
Here we set up the service with our credentials. Then, we use our new get(key: )
function to set the title and mix button’s label.
There is also a get(color: )
variant that will translate one of our Color
objects. So we use that for the actual mixing function:
1 | func doMix(_ sender: Any) { |
Similarly, we can get the UIPickerView
to use localized color names by using this same function:
1 | func pickerView(_ pickerView: UIPickerView, titleForRow row: Int, forComponent component: Int) -> String? { |
Looks good!
The iOS app will pick up changes if the translated content changes on the server. We could experiment with adding or removing languages, or updating translated keys.
You can find the source code at https://github.com/srl295/gp-ios-color-mixer.
Let me know if this works for you. This is my first post, and as I mentioned first app, in Swift so that’s a milestone. And, do let me know if^H^H
what can be done to improve the sample app.
Thanks! Now go and make it global.
]]>npm install --save g11n-pipeline
I managed to close about 13 issues since v1.2.x
I was able to increase function coverage to 100% thanks to the VSCode coverage plugin,and increase line coverage to 91%. Of course, when you test, you find bugs. Bugs such asrealizing that updateResourceStrings()
was unusable because there wasno way to pass the languageId parameter.
First of all, I synchronized the client with the latest current REST API. So take a peek at the docs againand see if there are any new features or fields.
I also tried to add some convenience functions. For example, getting the fulllist of language IDs supported used to require concatenating the source and targetlists. Now, with #40you can call .languages()
on the Bundle
object and it will build thislist for you. There is also a bundle.entries()
accessor as of #14which returns ResourceEntry
objects.
Speaking of convenience, most places where you used to call .someFunction({}, function callback(…){});
the {}
are optional. If it worked with {}
before, it's now optional.
The sample PR where I updated the sample codeshows some of the code improvements.
There are more features to add here, but I hope you like the changes in v1.3.0!
]]>#IUC40 ended today. It was incredibly pleasing to see many old faces, and some new ones. Next up, #UTC149, next week. 🍷
— Ken Lunde (小林剣) (@ken_lunde) November 4, 2016
Yes, that exactly. November 1-3, 2016 was the 40th Unicode conference.They used to be twice a year, in multiple locations, as in,outside of Santa Clara, California, USA.
Now that the conference is over, I’ll have to take some time to view slides from all of theother great presentations I missed whilegiving a personal record number of talks (long story), apart from the lightning talks which were apparently not recorded.
The conference, and Unicode in general, is about people. It is always great to see so manyfolks I've kept up with over the years… including of course my fellow IBMers frommany time zones away.
Off the top of my head, the important technical (besides personal) conversations I've had include:
Next week : IBM is hosting UTC 149 !
]]>gp-angular-client
on bower, angular-g11n-pipeline
on npm. Thanks to IBMer @ckoberlein (GitHub) this SDK now supports variable substitution. So you can have a string such as Hello
and translate and substitute this same string, so that for example in Spanish it will be Bienvenidos
. So, output would be Hello Steven
or Bienvenidos Steven
depending on language.
More details on our README and be sure to connect with us over on developerWorks Open!
]]>This is a work in progress. If you read to the end, you’ll see wealmost reached our goal here.
I work on ICU4C (the premier C/C++ library for Unicode support).And I work on Globalization Pipeline.These two haven’t really crossed paths… until now.
This blog will cover how to use the Globalization Pipeline to translateuconv
, one of ICU’s sample command line apps. We'll be translating the resourcefiles you can see in source/extra/uconv/resources
First, Download ICU4C source code (as a tarball or from the SVN repository) and compile it. See its readme for more details.
Now, set up Globalization Pipeline. See our Quick Start Guidefor getting your Globalization Pipeline instance created and set up.
In the GP dashboard, create a bundle named uconv
. Select which languages you want to translate into, but don’t upload any strings. Click Save.
Also from the Bluemix dashboard, get the service credentials for your service. Save these in a file called mycreds.json
that should look like theexample in this document.
We’ll also need the gp-cli
java tool, so download the latest jar from gp-java-tools.
Now, let's get some translated content
Hm. These files are in icu4c resource format, which isn't (yet?) supported by Globalization Pipeline… directly. Let's try an interim step.
genrb -x root root.txt
genrb -x fr fr.txt
Now we have root.xlf
and fr.xlf
(for good measure).
Here's a snippet of root.xlf
:
1 | <group id = "root" restype = "x-icu-table"> |
OK. The gp-cli
tool says it handles XLIFF as a file format. Let's get that set up.
java -jar gp-cli-1.1.0.jar import -b uconv -f root.xlf -l en -t xliff -j mycreds.json
Note that we use the language tag en
for English here, while the file was originally entitled root
.This is because Globalization Pipeline works with the explicit source language, whereas for ICU, root
is what will be consulted as a fallback if no other languages are available.
It says it uploaded… but let’s check in the Globalization Pipeline dashboard:
OK! That’s great. Browsing over to the other language translations, we can see that the MT engines are hardat work. However, we happen to already have some French translationsin the ICU source base. We'll upload this, to overwrite some of the Machine-translatedentries for French:
java -jar gp-cli-1.1.0.jar import -b uconv -f fr.xlf -l fr -t xliff -j mycreds.json
Great. Now we have some human translated content as well. We cannow correct, upload/download content in the dashboard until we arehappy with the translations there.
OK, now for the next step- getting those translations back into ICU4C.
We can list the bundle status from the command line:
java -jar gp-cli-1.1.0.jar show-bundle -b uconv -j mycreds.json
1 | { |
Now, we’ll download the files in XLIFF format again:
java -jar gp-cli-1.1.0.jar export -b uconv -f fr.xlf -l fr -t xliff -j mycreds.json
java -jar gp-cli-1.1.0.jar export -b uconv -f es.xlf -l es -t xliff -j mycreds.json
java -jar gp-cli-1.1.0.jar export -b uconv -f de.xlf -l de -t xliff -j mycreds.json
java -jar gp-cli-1.1.0.jar export -b uconv -f zh.xlf -l zh-Hans -t xliff -j mycreds.json
… and so on. Repeat for each language you wish to download. Note that we’ve used zh
for Chinese instead of zh-Hans
.
OK, we have XLIFF format. How to convert it to ICU format? genrb
only writes XLIFF, it can’t read it.
We need the XLIFF2ICU Converter as is noted here.
To build it, at present, this worked for me:
ant xliff
from the top levelout/xliff/lib/xliff.jar
Still with me? Head back to the uconv/resources
directory, and now run:
java -jar xliff.jar -s . -d . fr.xlf
And that brings us to…
1 | Processing file: ./fr.xlf |
Hrm. Seems like the XLIFF output isn't quite ready to be consumed.I filed a bug on this,of course.
We're so close… let's see what we can do.What if we fetch the data in JSON format, and then hack up somethingto convert it to ICU format? It might suffice for this blog post.
Let's run the fetches again, but get JSON this time:
java -jar gp-cli-1.1.0.jar export -b uconv -f fr.json -l fr -t json -j mycreds.json
…Now, run the following Node.js script over the JSON files:
node js2icu.js fr.json es.json …
1 | // js2icu.js |
You should be the proud owner of .txt
files matching all of the languages you are using.
We're almost there. Let's go up and build uconv
:
cd ..
Now edit resfiles.mk
and change the RESSRC
line to reference the new translations:
1 | RESSRC = $(RESOURCESDIR)$(FILESEPCHAR)root.txt $(RESOURCESDIR)$(FILESEPCHAR)fr.txt $(RESOURCESDIR)$(FILESEPCHAR)es.txt $(RESOURCESDIR)$(FILESEPCHAR)zh.txt |
Build uconv
…
make
Let’s test it. I know uwmsg.o
isn't really utf-8, that's why this is a test.
env LC_ALL=es ./../../bin/uconv -f utf-8 < uwmsg.o
1 | La conversión a Unicode de página de códigos falló en posición de byte de entrada de 0. Bytes: Error de cf: El carácter ilegal encontró La conversión a Unicode de página de códigos falló en posición de byte de entrada de 1. …… |
Looks like we have a (more) translated uconv
now.Some of the messages don’t quite work correctly due to ICU4C message conventions.Perhaps we will investigate this in the future.
Marcus DelGreco at #FluentConf said something about perl support on platforms.I mentioned Bluemix allowed bring your own buildpack
Looking through the buildpack lists didn't turn up Perl per se but…
… enter sourcey-buildpack. It's a generic buildpack!From its README I knew I was in the right spot:
Isn't it simply amazing to see these demos, where they throw a bunch of php, ruby, Java or python code at a Cloud Foundry siteand it gets magically turned into a running web applications. Alas for me, life is often a wee bit more complicated than that.My projects always seem to required a few extra libraries or theyare even written in an dead scripting language like Perl.
And now for that tl;dr
-inspiring moment:
Let's see if the canned sample works. Hint: yes.
First, cf login
into Bluemix, and then:
$ git clone https://github.com/oetiker/sourcey-buildpack$ cd sourcey-buildpack/example$ cf push MYAPPLICATION$$ -m 128M -b https://github.com/oetiker/sourcey-buildpack
The above builds perl (takes a while the first time) and deploys a little app that just dumps the deserialized JSON out.
But wait! It could be even simpler.So, I opened PRoetiker/sourcey-buildpack#2which adds a manifest file to the example. Then, only cf push
is needed,the -b …
option is now unnecessary.