On 19 June 1998, Cristina Mateo had written: > I need to write Japanese characters. > How do I do that in HTML for both Mac and Windows? On 1998-06-23, I have written: > HTML 4.0 has adopted the UCS > (ISO/IEC 10646-1:1993 + TAs, or Unicode 2.0 and above) as the document > character set for HTML. This means that you can, in principle, use any > UCS character in any WWW page that includes a HTML 4.0 version declaration, > see . > > Of course, what characters the reader of your page will be able to see, > depends on the browser, and fonts, installed by him/her. Today, I have conducted a little test with three browsers and the HTML source quoted below (which was stored locally on mysystem). The browsers I have tested are: - Alis Tango v3.1.1 c1.0 - Microsoft Internet Explorer 4.0 Version 4.71.1712.6 - Netscape Communicator 4.05 all running under - Microsoft Windows 95 4.00.950.B The sad result: no browser of my sample were fully compatible with the HTML 4.0 specification! The best results over all browsers are obtained with decimal NCRs; the most popular browsers do not understand hexadekadic NCRs. (While the HTML authors will understand decimal NCRs only by means of a hex-dec-converting calculator!) Best wishes, Otto Stolz --------------- Appendix ---------- 碁-Test

碁 — my favourite game

In HTML 4.0, I can code the character for Go (aka Baduk or Wei-ch'i) in the following ways:

Character encoding in UTF-8 E7 A2 81
Numeric character reference decimal 碁
hexadekadic 碁
碁

What does your browser display for each? Note also the document title in your browser windows title bar, or in the "Document Info" window.

Beware: the last cell in the first row of the table contains three non-ASCII bytes, which will be MIME-q-p encoded, in this letter. If theis cell does not display properly on your system, it may have been distorted by the mail-transferring process. In this case, insert three characters with the hex values given in the 3rd column of the same row. The title, the header, and the last row of the table should contain the same han/kanji character. The following table (please use a mono-pitch font to display it) summarizes how these browsers display the various encodings of a han/kanji character in various contexts. The encodings are denoted by the abbreviations UTF for UTF-8 dec for a decimal numerical character reference, hex for a hexadekadic, lower-case NCR, HEX for a hexadekadic upper-case NCR. Browser | enc | text window | title bar | menus | source -------------+-----+-------------+------------+------------+------------ Alis | UTF | ok | ok | ok | ok Tango | dec | ok | not tested | not tested | N/A | hex | ok | not tested | not tested | N/A | HEX | missing | not tested | not tested | N/A -------------+-----+-------------+------------+------------+------------ Microsoft | UTF | ok | N/A | N/A | wrong Internet | dec | ok | N/A | not tested | N/A Explorer | hex | wrong | N/A | not tested | N/A | HEX | wrong | N/A | not tested | N/A -------------+-----+-------------+------------+------------+----------- Netscape | UTF | ok | repl.char. | repl.char. | repl.char. Communicator | dec | ok | not tested | not tested | N/A | hex | repl.char. | not tested | not tested | N/A | HEX | repl.char. | not tested | not tested | N/A where wrong: encoding not recognized, byte values interpreted as Latin-1 characters N/A: not applicable repl.char: encoding recognized but character not available; rather, a question-mark or an open box ist displayed missing: blank space is displayed rather than the character