As more people around the globe join the Internet community, internet production tools and content offerings must evolve to work well in a wide variety of languages and cultures. This tutorial is a short introduction to some of the main localization issues and challenges that face web developers today. Special attention is paid to localization issues for Macromedia Flash. Fortunately, a large number of people from around the world are working on these issues, and publishing their thoughts about how to solve the many problems.
- No special software required, just a web browser to view online tutorials
- What is localization?
- Why is localization important for the Internet?
- What is Unicode?
- Is Unicode supported on the World Wide Web?
- Are some languages harder to localize for?
- Does Flash support localization?
- What can I do to make my Flash movies easier to localize?
- What are some different localization techniques for Flash?
- Follow Up activities
- Web Developers Virtual Library — Beyond Borders: Web Globalization Strategies
- W3C Web Internationalization Tutorials
In communications, Localization means adapting something for use in a different culture than the one for which it was originally developed. For example, many of the original Flash activities on MyGLife.org were created in English and then localized for other languages.
Translation of words is the most visible aspect of localization, but the issue is bigger than just words. Different cultures may also use different writing systems, display dates and numbers differently, have different meanings for the same symbols . Localization is closely related to Internationalization , which is the process of preparing work for localization to different languages and cultures.
Localization is important because the Internet and World Wide Web are accessible to people all over the world. Many people that might be interested in a web page or Internet application may not understand the same language and cultural style that the developer used to create it. Some people that do understand the language might prefer to use a more familiar, localized version, anyway. In the end it is up to the developer how easy it is for people from different cultures to use what has been created.
The most difficult technical issue of localization on the Internet is one of displaying the huge number of different character shapes used by the many languages in use on the Internet. The technology underlying most computer operating systems was originally created to support English letters, but it has been extended to support many other languages as well.
Unicode is the most widely used method for showing non-English characters. Unicode is a system of encoding characters as numbers, since numbers are at the heart of how computers store information of all kinds. Eventually Unicode is intended to support every written language that might be used on the Internet, in a way that is accessible to anyone with a computer.
Most web browsers and technologies for the World Wide Web provide support for Unicode. However, some languages present special problems, and not every browser offers equal support for every language and character set. To ensure that a localized page is displaying all the characters correctly, it is often necessary to check the pages on different browsers and operating systems.
In some cases, the users will require special operating system software to be able to see a localized page. Most users will already have the right operating system software for their own language on their computer. The developer may need to install the software to support many languages in order to check that the localization is working correctly.
The easiest language to support is English, and most other European languages based on Latin or Cyrillic alphabets are easy to support as well. English speakers in different countries vary in their spelling of certain words, and sometimes format numbers differently, but generally, they can read pages prepared for speakers of different styles of English without much difficulty.
Languages based on other alphabets and character sets may present special problems. It is not possible to list all the different issues in a short article, but a few examples may help show how varied some of the issues are.
- The traditional Chinese writing system included more than 40,000 different character forms. Modern readers of Chinese understand several thousand. Compared to English, which has only 26 letters, this is still a huge number. In addition, the characters are more complex than English letters, making them harder to display clearly at small sizes.
- Arabic letters are written from left-to-right, but English and most European languages go right-to-left. In addition, the shapes of the letters may be changed by ligatures, depending on where the letters are in a word, and what other letters are nearby. Finally, numbers and some words in Arabic texts may be written in non-Arabic letters (like English) and from left to right, appearing in the midst of traditional Arabic script that, flows from right to left. This combination of behaviors makes supporting Arabic very complicated for programmers.
- Hebrew and Persian writing share some of the same characteristics that make Arabic different, including being written in more than one direction, For this reason, all three are considered Bidirectional languages.
Translating the words and showing the characters correctly is only one part of localization. Many other issues may affect whether a page has been successfully localized, including layout, display of dates and other numbers, and use of images and symbols. Some languages, for example, require more, or less, space to say the same thing.
Flash supports localization to many languages. Some languages are fully supported but others cannot be localized as easily. As with other internet technologies, European-derived languages based on Latin or Cyrillic alphabets can be displayed without much difficulty. Other languages may be easier or more difficult depending on several factors.
Languages with very large character sets, like Chinese and Japanese, can present some special difficulties. Some Flash applications need to have fonts physically included in the Flash movie, in order to work properly. But the character sets used by Chinese and Japanese are so large that including all the font information necessary to display every character will greatly increase the file size of the Flash project, possibly making it impractical for many users to download at all.
Bidirectional languages, like Arabic, Hebrew and Persian, are particularly difficult to localize in Flash. Flash is reasonably good at displaying short sections of right-to-left text, as long as there are no left-to-right characters like numbers, or English phrases.
Flash does not support the Bidirectional Algorithm, which is a method for a computer to determine how to display the characters in languages that flow from right to left, even when they include sections with characters that flow in the opposite direction, from left to right. The Bidirectional Algorithm is supported by the Unicode Consortium, which develops and promotes Unicode, as well as most modern web browsers. Unfortunately, Flash does not support the Bidirectional Algorithm at all. It cannot be relied upon to display bidirectional text properly. Under some, limited, circumstances, Flash can display right-to-left text correctly.
There are several different techniques for localizing a Flash movie. Exactly which methods are most appropriate will vary depending on what languages the developer wants to localize for, what the developer's level of knowledge is, and how the original project uses text.
The first thing that the Flash developer needs to think about is what languages are the most important to be able to localize for. There is no single solution to accommodate every language. Limiting the scope of the localization will help identify what problems are most likely to occur, and which issues may be safely ignored.
One way to simplify localization is to avoid the use of text altogether, or at least limit the use of text to the minimum required to make the project work. For example, you can use a graphic symbol like an arrow, instead of the word "play" on a button that plays an animation.
Another important issue to consider is the amount of space for text in the project—in terms of both space and arrangement. A label that takes up a certain amount of space in the Developer's native language may require much more, or much less, in another. For example, the short English word " runoff" could be translated as the much longer phrase "Escoamento superficial" in Portuguese. Leaving extra space for text in your design is advisable.
Another layout issue to consider is the orientation of labels with respect to the objects that they identify. For example, if the original language is English, which flows from left to right, the label identifying the name of something in a picture might commonly appear on the left. In a right to left language, the label would more commonly appear on the right. Placing the labels above or below the object that they describe is one possibility. The most important thing is for the arrangement to make it very clear what labels identify which objects on the screen.
One of the most convenient ways to localize Flash projects is to load the text from an external file, usually XML, although other file types could be used as well. XML is a very popular and flexible way of storing small amounts of text. To localize the Flash project, the developer can simply cut and paste the translated text into the XML file, replacing the original text. Setting up a Flash project for localization using this method requires a relatively high level of understanding of both XML and Flash, and may require some moderately complex ActionScript programming. Additionally, it may not work for long texts in bidirectional languages at all. If the texts are all short, 100% right to left characters, and all on one line, it may be possible to use this method for a bidirectional language. If a developer needs to use a great deal of complex text, then a different method may be needed.
The simplest and most reliable method for displaying text with 100% reliability is to use just images—in some extremely complex cases, this may be the only method that will work. To do this, the developer needs to use a word processor or a graphics program that supports the language, export an image in a format that Flash understands (like a .jpg or .gif or .png, for example) and then import it into Flash. Since it is really a picture, it will not be affected by the limitations of Flash when dealing with poorly supported languages.
This simple and reliable method still has several significant drawbacks. First, it can be very time-consuming to make all the pictures. Images are also not editable inside Flash, so if the developer needs to make changes later (because of typos, or translation errors, which are both common problems), the fixes will take longer. Furthermore, pictures of text increase the file size, forcing the users to wait longer to download the Flash project. Finally, many projects are designed to use text "dynamically" meaning that the text in the screen changes according to what actions the user takes. Images are not well suited for use in this kind of situation.
- Think about techniques for making web content easily adaptable in other languages and cultures.
- Search for content pages that were created in a different language and identify differences to consider.
- Convert a content page from MyGLife into another language, using an automated translation tool, such as Google Language. What does it looks like? What are the differences from the original? Analyze how the text fit is impacted by the language change
- Brainstorm your own ideas for handling localization issues such as bi-directionality.