RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

From: D. Starner (
Date: Tue Dec 09 2003 - 16:37:50 EST

  • Next message: Elaine Keown: "RE: unification (CJKV history) ; Alphabetic Aramaic+ ..."

    > Just imagine what would be created with your assumption with this source:
    > const wchar_t c = L'?';
    > where ? is a combining character.

    The programmer would get bit. At best, there's no reason to assume that
    every compiler accepts UTF-8, besides that fact that you can't assume that
    the compiler or any intermediary step doesn't normalize. That's why Unicode
    escapes exist, and partially why Java as a general rule translates source into
    a form that uses Unicode escapes for non-ASCII characters.

    Even if you assume the compiler can accept Unicode text in whatever UTF you
    choose, it still seems needlessly dangerous to use a bare combining character
    instead of a Unicode escape or a numeric entity. Despite your distinction, there's
    no clear line between programming editors and non-programming editors. Any editor
    that gives you variable names in Hindi or Arabic is likely to have the sophistication
    need to combine that ? with that ', and I see no reason they won't; quite possibly,
    the underlying system won't give them the option to handle Hindi or Arabic and not
    combining that ? with that '. Emacs, for one notorious programming editor, fully
    plans to have that sophistication.

    Sign-up for Ads Free at

    This archive was generated by hypermail 2.1.5 : Tue Dec 09 2003 - 17:41:14 EST