UMD home UUP home UNIVERSAL USABILITY IN PRACTICE

 Introduction | RecommendationsGuidelinesWebsites | Conclusions | Resources 

Developing Website for Users of Languages Other Than English

Nagia M. Ghanem (ghanem@cfar.umd.edu)

Department of Computer Science
University of Maryland
College Park, MD 20742 USA
April 2001

 

Introduction

Today, the Internet is positioned to be an international mechanism for communication and information exchange, the precursor of a global information superhighway. For this vision to be realized, one important requirement is to enable all languages to be technically available via the Internet, so that when a society is ready to absorb Internet technology, the language capability comes prepackaged. This is a nontrivial multilingual-information processing problem. To appreciate the extent of this issue, it is enough to know that few years ago, English was the native language of 80% of web users. Today, English is the mother tongue of less than half of web users. However, statistics show that the language of about 80% of web sites is English with only about 8% could be classified as multilingual [1].

From the numbers above, making a website universally usable is an important issue and ignoring it may lead to groups of users suffering isolation, rather than enjoying the true interoperability alluded to by the very name of the World Wide Web. However designing websites in languages other than English or multilingual websites confronts designers with many requirements. These requirements generally fall into three categories: data representation, data display and data input requirements [2]. This paper studies these requirements, gives general recommendations for meeting them and provides a list of guidelines for web pages designers. It also gives examples of successful websites implemented in different languages.

 Top

 

Recommendations

Designing websites in languages other than English or multilingual websites offers 3 challenges for designers: data representation requirements, data display requirements and data input requirements. This section address these requirements in detail and gives general recommendations to meet them. 

Data representation requirements:
The English character set consists of only 26 letters, so using the ASCII code is sufficient to encode the English alphabet or the Latin alphabet in general. However, some languages are ideographic and use unique character to represent each different idea. The number of characters that comprise ideographic languages, several thousands, is far greater than the number of unique eight-bit combinations that can represent these characters [2]. So, to be able to encode web pages in these languages or web pages containing multiple languages, another encoding method than ASCII should be used. This encoding method should have the ability to represent multilingual plain text to overcome the difficulty of exchanging text files internationally.

Data display requirements:
·Text direction: English is among the languages that are written from left to right. However, Arabic and Hebrew are written from right to left. Other languages such as Chinese and Japanese may be written top to bottom. To display these different text directions, a method should be used to specify the direction. This may be in the form of a HTML tag or style.

·Text expansion: English is a very compact language, and text almost inevitably expands in translation. Running text may expand by around 30% on average in European languages. Short labels or single words, however, can easily expand by 200-300%. Designers need to consider, early in the design phase, how they will deal with text expansion. Possible approaches include the use of automatically expanding text objects, the use of tabs to reduce the amount of information in a single frame and alternative layout styles which optimize the use of the screen [3,4].

·Glyph rendering: It is a very common misconception that if one can simply map one-to-one from a character code to a glyph image, nothing more is required. This is not true: in some languages, a single character may have multiple glyph images even within the roman languages.

·Localization issues: Besides text, things like dates, money, telephone numbers, weights and measures should be displayed in the users’ native format.

Data input requirements:
As with display, users should be able to input data in the manner they are accustomed to. Text rendering should follow the rules of the language it is written in. For languages with large character sets, there is no one-to-one mapping from keyboard strokes to characters so helper applications or front-end processor software should be used. Its function is to interpret keystrokes on either specialized or standard keyboard. A conversion dictionary displays a list of candidate characters from which users can select the correct ideographic characters corresponding to the keyboard input [5].

  Top

 

Guidelines

·         Plan ahead.

      It is very important to clearly identify the purpose of the site and the language(s) into which the site will be converted before beginning to design it so a set of resources and standards can be prepared for implementation. 

·         Standardize your site.

It is best to adhere to standards when designing a site. The site should be universal and work on any browser that can support the languages and encoding you are working with. A decision should be made about which version of HTML will be used as the base. HTML 4 is better for creating non-English web pages. Some useful resources are: 

- HTML 4.0 Specification can be accessed at: http://www.w3.org/TR/REC-html40/

-Tango Creator (http://www.alis.com/). Tango Creator is a HTML editor. It isn't as powerful as most current HTML editors, but it is a              

      fully multilingual HTML editor.

·  Identify the language of each document.

This can be done using HTML tags. It is recommended that the primary language(s) of the document be specified in a META element. For example: 

<META HTTP-EQUIV="Content-Language" Content="en">

·  Decide which character set will be used.

It is important that the character set you choose to use is widely used by your target audience or that it is freely available for download. Unicode is a character set that allows for the representation of most of the world’s languages. So overcoming the ASCII’s limited ability to encode only the Latin alphabet. Some useful resources are:

- The World Wide Web Consortium has a list of some languages and the character sets commonly used: http://www.w3.org/International/O-charset-lang.html.

- The Unicode Standard is accessible at http://www.unicode.org

·         Identify the character set of each document.

The character set of the document should be specified in the META element so that browser can use the correct encoding to display the document. For example:

  <META Content-Type: text/html; charset=UNICODE>

Some useful resources are:

- SC Unipad (http://www.sharmahd.com/unipad). This is a Unicode compliant text editor. It can recognize multiple Unicode encodings. UniPad only supports European languages. 

- Unitype Global Writer (http://www.unitype.com/). Global writer is a Unicode compliant word processor that supports a wide variety of languages. 

·         Identify the direction of the text.

In addition to specifying the language of a document, the base directionality (left-to-right or right-to-left) of document's text should be identified. This is done with the dir attribute. For example:

       <Q lang="ar" dir="rtl">...an Arabic quotation...</Q>

·         Restrict font solutions to those available for free from the Internet. 

If you use a solution that requires your target audience to obtain commercial software, then it is not likely that your web site will be used. Provide a link to software and fonts that can be downloaded for free. To aid users in accessing the site, it is useful to provide a link to software and fonts that can be downloaded for free. Also it is useful to provide instructions in their own language. In this case the instructions are usually a set of images of text, rather than actual text. A good resource is 

- Port Phillip Library Service       http://home.vicnet.net.au/~ppls/sling/lotemain.htm

A language is selected from the menu to begin. Instructions in the language chosen outline how to view web pages written in that language. 

·         Plan for text expansion. 

When working with embedded text, text expansion and contraction should be considered. English does not translate to other languages on a 1:1 ratio. With European languages, the target text often expands by as much as 30%. Conversely, Asian target text often contracts. Designers need to consider, early in the design phase, how they will deal with text expansion. Possible approaches include the use of: 

-Automatically expanding text objects 

-Tabs to reduce the amount of information in a single frame 

-Alternative layout styles, which optimize the use of the screen. For example, the cascading style sheets describe how documents are presented on screens or in print.

CSS2 specification can be accessed at http://www.w3.org/tr/rec-css2/

·   Be aware of localization issues.

A website designed in more than one language can be properly translated and still be confusing to the users if it is not well “localized”. Localization entails tailoring the content to the specific requirements and needs of the local audience. Elements to consider include: currency, time and date formatting, measurements, writing style, and color and image selection. 

A great reference for localization issues can be found in chapter 5 of [7] .

·         Choose translation method carefully.

When designing websites for more than one language, it is important to choose the text translation method carefully. Traditionally, translation has been done by trained professionals, ideally by native speakers of the target language. This process produces the best results, but it can be time consuming and expensive. Machine translation, on the other hand, always has less accuracy depending on the text being translated, but is inexpensive and rapid. Machine translation works best for technical documents with text that follows standard English rules, with no ambiguities [4]

Beware of posting machine-translated text on your site. You risk alienating your audience by providing what is perceived as a sloppy laughable translation. Still, machine translation may prove useful for some projects with limited budget. Some useful free web-based translators are: 

-Systran http://www.systransoft.com/    

-Transparent Language http://www.freetranslation.com/

·      Maintain your site.

If the website contains more than one language, and the contents of one language are updated, the contents of other languages should be updated as well. Uniscape provides one solution. Its translation management system tracks changes made to the original HTML files and then sends just the changes to the designated translators. 

-         Uniscape can be accessed at http://www.uniscape.com

 Top

 

Websites

1.      Yahoo.com: http://fr.yahoo.com/      http://de.yahoo.com/        (Accessed on April 16,2001)

 

 

 

These are two examples of Yahoo versions implemented in French and German. As it is clear, no problem in displaying both languages as they are European languages based on Latin character sets. Both languages are written from left to right. Localization issues are taken into consideration as it is clear from date and time formats.

2.      Amazon.com http://www.amazon.com           (Accessed on April 16,2001)

 

Amazon.com is a good example of a multilingual website that has its site in different languages (Japanese, French, German, Spanish and English). To display the Japanese version, downloading a plugin is needed, as most browsers do not support the character set used.

 3. Ahram.org http://www.ahram.org.eg          (Accessed on April 16,2001)

 

Ahram is an Egyptian daily newspaper. This is a good example of an Arabic (right-to-left language) web site. The language is specified in the header of the HTML file by  <META NAME="MS.LOCALE" CONTENT="AR">. The character set used is windows-1256 and is specified by <META HTTP-EQUIV='Content-Type' CONTENT='text/html; charset=windows-1256>. Localization issues are taken into consideration as it is clear from date and currency formats.

4. Online Computer Library Center (OCLC) FirstSearch        http://www.oclc.org/firstsearch/    (Accessed on April 16,2001) 

Firstsearch allows users to search for bibliographic and full text records in over 80 online databases. The search can be done in many languages. It also includes a listing of recommended browsers, viewers, and plugins.    

 Top

 

Conclusions

To make the web universally usable and to remove the language barriers, researchers should advocate more effort to the process of developing websites in different languages. The current work of W3C on internationalization [9] and the Unicode standard [10] are two examples of current research directions towards making the web universally usable. Both of them are still open areas for research.   

 Top

 

Resources

[1] B. F. Lavoie, E. T. O’Neill, How World Wide Is the web? Trends in Internationalization of Web Sites”. http://www.oclc.org/oclc/research/publications/review99/oneill/lavoie.htm (Accessed on April 16,2001).

[2] G. Nicol, “The Multilingual World Wide Web,”

http://www.oasis-open.org/cover/nicol-multwww.html. (Accessed on April 16,2001).

[3] R. Ishida, “ Challenges in designing International User Information,”  

http://www.xerox-emea.com/globaldesign/paper/paper2.htm. (Accessed on April 16,2001).

[4] C.K. Merrill, M. Shanoski, “Internationalizing Online Information,” ACM Tenth International Conference on Systems Documentation 1992 p.19-25

http://www.acm.org/pubs/articles/proceedings/doc/147001/p19-merrill/p19-merrill.pdf (Accessed on April 16,2001).

[5] L. K. Yong, T. T. Wee, N. Govindasamy, L. T. Chee, “Multiple Language Support over the World Wide Web,” 

http://www.isoc.org/isoc/whatis/conferences/inet/96/proceedings/a5/a5_2.htm (Accessed on April 16,2001).

[6] M. Lerner, “Building Worldwide Web sites,” IBM: developerWorks: Web architecture library, September 1999 

http://www-106.ibm.com/developerworks/web/library/web-localization.html?dwzone=web (Accessed on April 16,2001).

[7] Tony Fernandes, ‘Global Interface Design,’ Academic Press, 1995. 

[8] A. Cunningham, “Multilingual Unicode Web Page Development,”  

http://members.ozemail.com.au/~andjc/papers/cn99.html (Accessed on April 16,2001).

[9] World-Wide Character Sets, Languages, and Writing Systems  

http://www.w3.org/International/ (Accessed on April 16,2001).

[10] The Unicode Standard. 

http://www.unicode.org (Accessed on April 16,2001).

mailto:ghanem@cfar.umd.edu
April 2001