Re: Designing a multilingual web site

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Tue Jul 18 2000 - 14:27:37 EDT


Munzir Taha wrote:
> I opened notepad, write arabic, and saved the file as filename.htm with
> Encoding UTF-8.

note that this requires windows 2000. windows nt notepad can save only in the system codepage and in utf-16le. win9x notepad does not support unicode.

> Opening the page, I found that view -> Encoding shows
> Unicode (UTF-8) with auto-select enabled. My question is where this info
> lies - In my box?.

notepad always saves unicode-encoded files with the appropriate signature byte sequence, like most other microsoft-apps and many other well-behaved applications.

they are the first 2 to 4 bytes in the text file, encode U+feff in the particular encoding scheme, and are as follows:

utf-8: ef bb bf
utf-16be: fe ff
utf-16le: ff fe
utf-32be: 00 00 fe ff
utf-32le: ff fe 00 00 (check before utf-16le!)
scsu: 0e fe ff (unfortunately rather rarely used)

> Suppose I publish the page, how can people know that I
> told notepad to save as Unicode ;-)

the best way for html is really the way michael described in his reply, with a meta tag. note that it is good practice, recommended by html 4.0 and required by xhtml 1.0, to write the elements and attributes in michael's html line in lowercase - xhtml and xml are case-sensitive.

for xml and xhtml, you need to specify the encoding in the xml declaration (it defaults to utf-8 in xml).

markus



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT