Re: (Very) plain 7-bit ASCII in US placenames

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Mon Oct 17 2005 - 07:24:11 CST

  • Next message: John D. Burger: "Re: (Very) plain 7-bit ASCII in US placenames"

    On Mon, 17 Oct 2005, Stephane Bortzmeyer wrote:

    > It is quite common in France (where the composed characters are much
    > more common) to have *some* official bodies unable to deal with
    > anything else than US-ASCII.

    I'm afraid such restrictions, and variation in them, is rather common,
    even in countries where people use an essentially richer character
    repertoire in everyday E-mail, text processing, etc. What's worse,
    the restrictions are often undocumented or poorly documented, and
    what happens when data exceeds the limitations might be unpredictable.

    I don't know what could be done with this in general, but the
    "exemplar characters" definitions in CLDR come into my mind.
    They are currently limited to letters, unfortunately, and they
    are meant to describe the use of letters in a language, rather
    than the common practice of character repertoire in a country or
    other territory.

    It would be nice if we had a definition of "commonly available characters"
    for each country, describing the _typical_ repertoire. But I'm afraid the
    situation varies too much, as seen from the examples presented. In
    addition to technical limitations, which may still exist, there are
    restriction on the repertoire due to political decisions and due to
    assumptions that (e.g.) non-ASCII characters cannot be typed in or just
    might break something.

    So it's perhaps more constructive to look forward and try to specify the
    character _requirements_ for writing different languages correctly,
    perhaps at several levels.

    -- 
    Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
    


    This archive was generated by hypermail 2.1.5 : Mon Oct 17 2005 - 07:25:24 CST