Re: About that alphabetician...

From: Brian Doyle (brian@gael-image.com)
Date: Thu Sep 25 2003 - 13:49:15 EDT

  • Next message: Marco Cimarosti: "RE: About that alphabetician..."

    Eric,

    Forgive my density. I¹m not sure that I understand. Are you arguing that an
    ASCII encoding scheme (ISO-8859-1) is not a limitation because,
    semantically, all of the characters (a, b, c, etc.) also exist in the
    Unicode scheme?

    It makes sense to me that ASCII is not a limitation for those documents that
    are limited to that character set. But, your own message, ³which contains
    U+10DB ? GEORGIAN LETTER MAN and U+092E Ã DEVANAGARI LETTER MA² triggers an
    error message in my own email client (Entourage X), namely:

    ³Some text in this message is in a langauge that your computer cannot
    display.²

    I¹m not certain if I¹m seeing this because I don¹t possess a font to display
    those characters or some other reason. I suspect that this is the reason
    because, when I try to look up those character's in OS X's Character
    Palette, the Georgian and Devongari Unicode blocks show up blank.

    The observation that I, the ³Irish (American) colleague,² made to Michael
    was that there is a sentence in the NYT article displayed in my browser that
    dropped the OOE7 LATIN SMALL LETTER C WITH CEDILLA (e.g., François).

    There's nothing in the paragraph in question to indicate that there is a
    missing character--nor is there a numeric code displayed for a savvy user to
    look up.

    Surely in this context, we would agree that the semantic content was
    distorted, yes?

    Sincerely,
    Brian Doyle
    Unicode newbie

    On 9/25/03 11:54 AM, "Eric Muller" <emuller@adobe.com> wrote:

    >
    >
    > Michael Everson wrote:
    >> An Irish colleague here said he liked the article but noted that the Times'
    >> web directors don't use Unicode....
    >>
    >>
    >>> ...
    >>> <meta http-equiv="charset" content="iso-8859-1">
    >>> ...
    >>>
    >>>
    > There is an alternative point of view, which says that charset declared in an
    > HTML (or XML) document is no more than an encoding scheme, and that all
    > characters in those documents are fundamentally Unicode characters (i.e. they
    > start in life with the full semantic of Unicode, they don't inherit it on the
    > occasion of character set conversion). That view is supported by the XML spec
    > itself, and by the infoset definition. And because we have numeric character
    > entities, using an iso-8859-1 encoding scheme is not really a limitation:
    > witness this message, which contains U+10DB ? GEORGIAN LETTER MAN and U+092E Ã
    > DEVANAGARI LETTER MA.
    >
    > Eric.
    >
    >



    This archive was generated by hypermail 2.1.5 : Thu Sep 25 2003 - 14:43:15 EDT