Introduction | Recommendations | Guidelines | Websites | Conclusions | Resources
Nagia M. Ghanem (ghanem@cfar.umd.edu)
Department of Computer
Science
University of Maryland
College Park, MD 20742 USA
April 2001
Today, the Internet is positioned to be an international mechanism for communication and information exchange, the precursor of a global information superhighway. For this vision to be realized, one important requirement is to enable all languages to be technically available via the Internet, so that when a society is ready to absorb Internet technology, the language capability comes prepackaged. This is a nontrivial multilingual-information processing problem. To appreciate the extent of this issue, it is enough to know that few years ago, English was the native language of 80% of web users. Today, English is the mother tongue of less than half of web users. However, statistics show that the language of about 80% of web sites is English with only about 8% could be classified as multilingual [1].
From the numbers above, making a website universally usable is an important issue and ignoring it may lead to groups of users suffering isolation, rather than enjoying the true interoperability alluded to by the very name of the World Wide Web. However designing websites in languages other than English or multilingual websites confronts designers with many requirements. These requirements generally fall into three categories: data representation, data display and data input requirements [2]. This paper studies these requirements, gives general recommendations for meeting them and provides a list of guidelines for web pages designers. It also gives examples of successful websites implemented in different languages.
Designing websites in languages other than English or multilingual websites offers 3 challenges for designers: data representation requirements, data display requirements and data input requirements. This section address these requirements in detail and gives general recommendations to meet them.
Data
representation requirements:
The English character set consists of only 26 letters, so using the ASCII code
is sufficient to encode the English alphabet or the Latin alphabet in general. However,
some languages are ideographic and use unique character to represent each
different idea. The number of characters that comprise ideographic languages,
several thousands, is far greater than the number of unique eight-bit
combinations that can represent these characters [2]. So, to
be able to encode web pages in these languages or web pages containing multiple
languages, another encoding method than ASCII should be used. This encoding
method should have the ability to represent multilingual plain text to overcome
the difficulty of exchanging text files internationally.
Data
display requirements:
·Text direction: English is among the
languages that are written from left to right. However, Arabic and Hebrew are
written from right to left. Other languages such as Chinese and Japanese may be
written top to bottom. To display these different text directions, a method
should be used to specify the direction. This may be in the form of a HTML tag
or style.
·Text expansion: English is a very compact language, and text almost inevitably expands in translation. Running text may expand by around 30% on average in European languages. Short labels or single words, however, can easily expand by 200-300%. Designers need to consider, early in the design phase, how they will deal with text expansion. Possible approaches include the use of automatically expanding text objects, the use of tabs to reduce the amount of information in a single frame and alternative layout styles which optimize the use of the screen [3,4].
·Glyph rendering: It is a very common misconception that if one can simply map one-to-one from a character code to a glyph image, nothing more is required. This is not true: in some languages, a single character may have multiple glyph images even within the roman languages.
·Localization issues: Besides text, things like dates, money, telephone numbers, weights and measures should be displayed in the users’ native format.
Data
input requirements:
As with display, users should be able to input data in the manner they are
accustomed to. Text rendering should follow the rules of the language it is
written in. For languages with large character sets, there is no one-to-one
mapping from keyboard strokes to characters so helper applications or front-end
processor software should be used. Its function is to interpret keystrokes on
either specialized or standard keyboard. A conversion dictionary displays a list
of candidate characters from which users can select the correct ideographic
characters corresponding to the keyboard input [5].
· Plan ahead.
It is very important to clearly identify the purpose of the site and the language(s) into which the site will be converted before beginning to design it so a set of resources and standards can be prepared for implementation.
·
Standardize your site.
It is best
to adhere to standards when designing a site. The site should be universal and
work on any browser that can support the languages and encoding you are working
with. A decision should be made about which version of HTML will be used as the
base. HTML 4 is better for creating non-English web pages. Some useful resources
are:
- HTML 4.0 Specification can be accessed at: http://www.w3.org/TR/REC-html40/
-Tango Creator (http://www.alis.com/). Tango Creator is a HTML editor. It isn't as powerful as most current HTML editors, but it is afully multilingual HTML editor.
· Identify the language of each document.
This can be done using HTML tags. It is recommended that the primary language(s) of the document be specified in a META element. For example:
<META HTTP-EQUIV="Content-Language" Content="en">
· Decide which character set will be used.
It is important that the character set you choose to use is widely used by your target audience or that it is freely available for download. Unicode is a character set that allows for the representation of most of the world’s languages. So overcoming the ASCII’s limited ability to encode only the Latin alphabet. Some useful resources are:
- The World Wide Web Consortium has a list of some languages and the character sets commonly used: http://www.w3.org/International/O-charset-lang.html.
- The Unicode Standard is accessible at http://www.unicode.org
· Identify the character set of each document.
The character set of the document should be specified in the META element so that browser can use the correct encoding to display the document. For example:
<META Content-Type: text/html; charset=UNICODE>
Some useful resources are:
- SC
Unipad (http://www.sharmahd.com/unipad).
This is a Unicode compliant text editor. It can recognize multiple Unicode
encodings. UniPad only supports European languages.
- Unitype
Global Writer (http://www.unitype.com/).
Global writer is a Unicode compliant word processor that supports a wide variety
of languages.
· Identify the direction of the text.
In
addition to specifying the language of a document, the base
directionality (left-to-right
or right-to-left) of document's text should be identified. This is done with the
dir attribute. For example:
<Q lang="ar" dir="rtl">...an Arabic
quotation...</Q>
·
Restrict font
solutions to those available
for free from the Internet.
If
you use a solution that requires your target
audience to obtain commercial software, then it is not likely that your web site
will be
used. Provide a link to software and fonts that can be downloaded for free. To
aid users in accessing the site, it is useful to provide a link to software and
fonts that can be downloaded for free. Also it is useful to provide instructions
in their own language. In this case the instructions are usually a set of images
of text, rather than actual text. A good resource is
-
Port Phillip Library Service
http://home.vicnet.net.au/~ppls/sling/lotemain.htm
A
language is selected from the menu to begin. Instructions in
the language chosen outline how to view web pages written in that language.
·
Plan for text
expansion.
When
working with embedded text,
text expansion and contraction should be
considered. English does not translate to other languages on a 1:1 ratio. With
European languages, the target text often expands by as much as 30%. Conversely,
Asian target text often contracts. Designers need to consider, early in the
design phase, how they will deal with text expansion. Possible approaches
include the use of:
-Automatically
expanding text objects
-Tabs
to reduce the amount of information in a single frame
-Alternative
layout styles, which optimize the use of the screen. For
example, the cascading style sheets describe how documents are presented on
screens or in print.
CSS2
specification
can be accessed at http://www.w3.org/tr/rec-css2/
·
Be aware of localization issues.
A
website designed in more than one language can be properly
translated and still be confusing to the users if it is not well
“localized”. Localization entails tailoring the content to the specific
requirements and needs of the local audience. Elements to consider include:
currency, time and date formatting, measurements, writing style, and color and
image selection.
A
great reference for localization issues can be found in chapter
5 of [7] .
·
Choose translation
method carefully.
When
designing websites for more than one language, it is important to choose the
text translation method carefully. Traditionally, translation
has been done by trained professionals, ideally by native speakers of the target
language. This process produces the best results, but it can be time consuming
and expensive. Machine translation, on the other hand, always has less accuracy
depending on the text being translated, but is inexpensive and rapid. Machine
translation works best for technical documents with text that follows standard
English rules, with no ambiguities [4].
Beware
of posting machine-translated text on your site.
You risk alienating your audience by providing what is perceived as a sloppy
laughable translation. Still, machine translation may prove useful for some
projects with limited budget. Some useful free web-based translators are:
-Systran
http://www.systransoft.com/
-Transparent Language
http://www.freetranslation.com/
·
Maintain your
site.
If
the website contains more than one language, and the contents of one language
are updated, the contents of other languages
should be updated as well. Uniscape provides one solution. Its translation
management system tracks changes made to the original HTML files and then sends
just the changes to the designated translators.
- Uniscape can be accessed at http://www.uniscape.com
1. Yahoo.com: http://fr.yahoo.com/ http://de.yahoo.com/ (Accessed on April 16,2001)

These are two examples of Yahoo versions implemented in French and German. As it is clear, no problem in displaying both languages as they are European languages based on Latin character sets. Both languages are written from left to right. Localization issues are taken into consideration as it is clear from date and time formats.
2. Amazon.com http://www.amazon.com (Accessed on April 16,2001)
Amazon.com is a good example of a multilingual website that has its site in different languages (Japanese, French, German, Spanish and English). To display the Japanese version, downloading a plugin is needed, as most browsers do not support the character set used.
3. Ahram.org http://www.ahram.org.eg (Accessed on April 16,2001)

Ahram is an Egyptian daily newspaper. This is a good example of an Arabic (right-to-left language) web site. The language is specified in the header of the HTML file by <META NAME="MS.LOCALE" CONTENT="AR">. The character set used is windows-1256 and is specified by <META HTTP-EQUIV='Content-Type' CONTENT='text/html; charset=windows-1256>. Localization issues are taken into consideration as it is clear from date and currency formats.
4. Online Computer Library Center (OCLC) FirstSearch http://www.oclc.org/firstsearch/ (Accessed on April 16,2001)
Firstsearch allows users to search for bibliographic and full text records in over 80 online databases. The search can be done in many languages. It also includes a listing of recommended browsers, viewers, and plugins.
To make the web universally usable and to remove the language barriers, researchers should advocate more effort to the process of developing websites in different languages. The current work of W3C on internationalization [9] and the Unicode standard [10] are two examples of current research directions towards making the web universally usable. Both of them are still open areas for research.
[1] B. F. Lavoie, E. T. O’Neill, “How World Wide Is the web? Trends in Internationalization of Web Sites”. http://www.oclc.org/oclc/research/publications/review99/oneill/lavoie.htm (Accessed on April 16,2001).
[2] G. Nicol, “The Multilingual World Wide Web,”
http://www.oasis-open.org/cover/nicol-multwww.html. (Accessed on April 16,2001).
[3] R. Ishida, “ Challenges in designing International User Information,”
http://www.xerox-emea.com/globaldesign/paper/paper2.htm. (Accessed on April 16,2001).[4]
C.K. Merrill, M. Shanoski, “Internationalizing Online Information,” ACM
Tenth International Conference on Systems Documentation 1992 p.19-25
[5] L. K. Yong, T. T. Wee, N. Govindasamy, L. T. Chee, “Multiple Language Support over the World Wide Web,”
http://www.isoc.org/isoc/whatis/conferences/inet/96/proceedings/a5/a5_2.htm (Accessed on April 16,2001).
[6] M. Lerner, “Building Worldwide Web sites,” IBM: developerWorks: Web architecture library, September 1999
http://www-106.ibm.com/developerworks/web/library/web-localization.html?dwzone=web (Accessed on April 16,2001).[7] Tony Fernandes, ‘Global Interface Design,’ Academic Press, 1995.
[8] A. Cunningham, “Multilingual Unicode Web Page Development,”
http://members.ozemail.com.au/~andjc/papers/cn99.html (Accessed on April 16,2001).[9] World-Wide Character Sets, Languages, and Writing Systems
http://www.w3.org/International/ (Accessed on April 16,2001).
[10] The Unicode Standard.
http://www.unicode.org (Accessed on April 16,2001).
mailto:ghanem@cfar.umd.edu