Re: Multilingual Documents [was: HTML forms and UTF-8]

From: Jonathan Rosenne (
Date: Fri Dec 03 1999 - 00:57:43 EST

Please note that according the HTML 4 the document character set is always
the full UCS, independent of the character encoding used. Even if the
document is encoded in ISO-8858-1 or US-ASCII, is can use any character in
the UCS by means of numeric character references (NCRs). I have seen pages
consisting entirely of NCRs.

For some multilingual documents this would be the natural way to go - for
example, if the page is mainly in one language with short phrases in
another script.

Thus, wether you serve UTF-8 or UTF-16 or any other encoding is in
principle irrelevant to the issue.


At 12:20 02/12/99 -0800, A. Vine wrote:
>Michael Everson wrote:
>> Ar 18:18 -0800 1999-12-01, scríobh A. Vine:
>> >The bottom line was then and is now, how much are folks willing to pay for
>> >multilingual capability, and how many folks are willing to pay it?
>> >companies are for-profit organizations. Multilingual support is not
>> >It costs a tremendous amount of money to garner the expertise, evaluate
>> >product(s), design, and code, for multilingual.
>> Well, Andrea, it depends _how_ multilingual. In Unicode terms, if it has to
>> do with representing many complex scripts, that costs lots in terms of
>> rendering. But what I don't want to see is a limitation of character
>> repertoire, say, in the Latin script.
>Neither do I. But the point is that if I serve up UTF-8 or UTF-16 on HTML
>pages, most people will not see data outside of ASCII correctly. We're not
>there yet. I _can_ serve UTF-8 to folks in HTML pages should they choose
>as their preferred charset (providing our customers gave their customers that
>option). But as it turns out, very few folks actually _do_ choose UTF-8, or
>request multilingual capabilities which results in our serving them UTF-8.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT