From: Brian Doyle (firstname.lastname@example.org)
Date: Thu Sep 25 2003 - 13:49:15 EDT
Forgive my density. I¹m not sure that I understand. Are you arguing that an
ASCII encoding scheme (ISO-8859-1) is not a limitation because,
semantically, all of the characters (a, b, c, etc.) also exist in the
It makes sense to me that ASCII is not a limitation for those documents that
are limited to that character set. But, your own message, ³which contains
U+10DB ? GEORGIAN LETTER MAN and U+092E Ã DEVANAGARI LETTER MA² triggers an
error message in my own email client (Entourage X), namely:
³Some text in this message is in a langauge that your computer cannot
I¹m not certain if I¹m seeing this because I don¹t possess a font to display
those characters or some other reason. I suspect that this is the reason
because, when I try to look up those character's in OS X's Character
Palette, the Georgian and Devongari Unicode blocks show up blank.
The observation that I, the ³Irish (American) colleague,² made to Michael
was that there is a sentence in the NYT article displayed in my browser that
dropped the OOE7 LATIN SMALL LETTER C WITH CEDILLA (e.g., François).
There's nothing in the paragraph in question to indicate that there is a
missing character--nor is there a numeric code displayed for a savvy user to
Surely in this context, we would agree that the semantic content was
On 9/25/03 11:54 AM, "Eric Muller" <email@example.com> wrote:
> Michael Everson wrote:
>> An Irish colleague here said he liked the article but noted that the Times'
>> web directors don't use Unicode....
>>> <meta http-equiv="charset" content="iso-8859-1">
> There is an alternative point of view, which says that charset declared in an
> HTML (or XML) document is no more than an encoding scheme, and that all
> characters in those documents are fundamentally Unicode characters (i.e. they
> start in life with the full semantic of Unicode, they don't inherit it on the
> occasion of character set conversion). That view is supported by the XML spec
> itself, and by the infoset definition. And because we have numeric character
> entities, using an iso-8859-1 encoding scheme is not really a limitation:
> witness this message, which contains U+10DB ? GEORGIAN LETTER MAN and U+092E Ã
> DEVANAGARI LETTER MA.
This archive was generated by hypermail 2.1.5 : Thu Sep 25 2003 - 14:43:15 EDT