Toward a Multilingual Web Site: Easy First Steps

by Ostatic Staff - Mar. 11, 2009

"For many publishers and web app developers, from independent bloggers to high volume sites, designing a site to be multilingual is an afterthought, often thought to be extremely difficult," writes Brian McConnell of Worldwide Lexicon and DerMundo, in this guest post for OStatic. That's unfortunate because the world is a big place, and there's a lot of interesting content out there waiting to be read. Also accessibility in multiple languages is directly in the spirit of open source. Here are some easy first steps for helping make your site accessible in other languages, to be followed by several specific tools Brian has identified that you can use.

Designing a Site to Be Multilingual

By Brian McConnell

For many publishers and web app developers, from independent bloggers to high volume sites, designing a site to be multilingual is an afterthought, often thought to be extremely difficult. That's unfortunate because the world is a big place, and there's a lot of interesting content out there waiting to be read, if people can find it and understand it. This article explains some simple techniques you can use to make your site accessible in many languages. Some of these techniques are technical, and some are procedural, on how you can make it easy for people to help you make your site accessible in other languages.

Language is one of the few remaining barriers, and because of the complexity and nuance of human language, it's not a barrier that can be overcome with technology alone. However, by combining technology, people and a few shortcuts, it is possible to build sites, blogs and web apps that are accessible in many languages. If you're clever about it, you can do quite a bit at little cost, but it's important to understand what's feasible and not.

Here's the trick with language: It's really a form of shorthand for painting a physical or mental scene in another person's mind. It can be used to describe objects, smells, vague states of mind, anything really. A computer, on the other hand, has no comprehension of these things, and because of that can only analyze language at a statistical level. This is what machine translations do, by building up associations that this pattern usually maps to that. The problem is that's not really the way language works, and if there isn't a database of A = B associations, the computer is lost. Often, translation is 'retelling' something, sometimes in a completely different way. That isn't to say that machine translation doesn't work, it's just that it can only go so far, and while it may be  intelligible, does not produce prose that is pleasing to read. The fact that it works at all is impressive.

I've worked for several years on the Worldwide Lexicon, an open source project that focuses on translation and localization tools. My design philosophy has been to combine people and machines in ways that exploit the talents of each. Computers are calculators. They excel at storing and recalling information, and at doing simple comparisons. People, on the other hand, excel at communication and at understanding the subtlety of human language, but they have bad memories and they're not especially fast. The trick then is to develop tools that make it easy for bilingual people to contribute to a website, while using computers to enable them to do so easily and efficiently. The incentives for them to do so vary, and may range from volunteerism to direct payments for work, or some combination of these things. Following are a few simple guidelines I've developed in my work on WWL and Der Mundo:

1. If you're building a web app, build localization into it from day one. Use gettext, Pootle and similar tools that can render your interface in any language and locale (many of them are provided as links in the follow-up post to this one). You'll probably start off in one language only, but your app will be coded so adding future languages is a simple switch setting. Avoid using complicated expressions like "You have X credits in account Y as of date Z"  because these can have completely different forms in various languages. Instead go with something simple like "Account Balance: X". You may never decide to go multilingual, but if you do, it'll be an easy transition. If you don't do this, it will be a slow and expensive process.
2. Identify multilingual users and modify your user interface when they visit. You can easily do this by checking Accept-Language headers to detect which languages they prefer. It's not 100% accurate, but if someone's browser is telling you their first language is French, it's highly likely that they speak French. If they have multiple languages set, you know they probably speak two or more languages. Invite these people to contribute to your site in their languages, and they just might. This doesn't need to be super fancy--a simple link and invitation to volunteer is often enough to find volunteer translators for a blog, who you can then set up as guest editors.
3. Invite user feedback. You might not want to let users translate your site themselves, but you can at least provide a simple feedback loop where they can submit suggestions that can be queued for review behind the scenes. This can be a valuable tool for cross checking work done by translators, and also to reduce workload. This doesn't need to be highly engineered, and a simple submission form is often enough (professional translators are good typists and can easily cut and paste into whatever publishing system you're using). Also be on the lookout for fans who are interested in helping you translate or port your website.
4. Identify machine translations as machine translations, with a disclaimer that they are machine generated and probably need to be corrected. They almost certainly contain errors. If you tell users up front that they are placeholders, especially if they can send feedback or corrections, it minimizes the impact of bad translations. Also give users the option to turn them off so only human edited texts are visible. Many people find machine translations to be obnoxious and unpleasant to read, and if they have a working knowledge of the source language will prefer to read the original text. On the other hand, they're cheap to produce, useful as placeholders, and are often better than nothing.
5. Different tasks require different skill levels. Sometimes you need professional translators, sometimes not. It depends on what the material is, and whether it's important to publish something quickly and reasonably accurately, or slowly and perfectly. It's also important to understand that human translators will make mistakes in a different way than a computer. A person is more likely to know he doesn't know something, where a computer will bravely drive right off a cliff.

To see an example of this, Der Mundo is a social translation tool I built recently.It enables people to read, create and share translations for their favorite websites, blogs and news feeds.  You can find a couple of examples, here and here. The system combines machine translation with human edits. The editing process is open, much like Wikipedia, so there are no barriers to participation. It can be used as a standalone system, or can import and export RSS/ATOM, for example, to feed user-generated translations back into a content management system for further review and curation. You wouldn't want to use an open process like this for some types of websites, like a corporate site, but it illustrates how you can combine automation and user generated content for sites with active user communities.