Re: New to Unicode

From: Doug Ewell (
Date: Tue Jul 25 2006 - 09:34:51 CDT

  • Next message: Stephane Bortzmeyer: "Re: New to Unicode"

    Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

    >> The basic requirement is that the script in question must be used
    >> "overwhelmingly" to write the language in question -- not necessarily
    >> 100% of the time, but certainly 51% would not qualify.
    > Have you got only one example where Breton is not written with the
    > Latin script or one of its variants (like Celtic, Gothic and Germanic
    > Fraktur)? (excepting Braille which is really meant for use as a
    > different media, normally not read on paper with eyes, and printed or
    > drawn with normal inks and tools).
    > Breton usage of the Latin script is nearly 100% without ambiguity
    > (there may exist some transcriptions with the Latin or Cyrillic
    > script, but it wouldbe used by scholar linguists working on compared
    > linguistic and phonetic features).

    You are almost certainly correct about Breton. There are almost 500
    languages specified in the Language Subtag Registry and there are surely
    dozens that should have a Suppress-Script but don't, at least not yet.

    Please submit a registration form to the ietf-languages list AFTER
    reading the approved draft at
    <>, and
    especially Section 3.5.

    > I cited Breton because i don't know how the initial set of languages
    > was specified; Those that have a designated script are either very
    > wellknown and have global status, but there are endangered rare
    > languages that are most likely known only in English speaking areas.

    One criterion for which languages should initially be given a
    Suppress-Script was to consider the most commonly used languages first.
    Suppress-Script exists because there are RFC 3066 processors that
    perform matching using strict truncation, and will not match
    "fr-Latn-FR" with "fr-FR" because the script tag would get in the way.
    The amount of Breton text on the Internet that would be affected by this
    problem is much smaller than for English or Arabic or Hindi, so those
    languages were considered first.

    There is, once again, no claim that the current Suppress-Script data is
    complete and comprehensive. Contributions are welcome.

    > This looks like if this field was an attempt to codify a proposal made
    > by only one or a few persons, the field being accepted, but the values
    > beling let to define later, so the few contributions were made by a
    > few people that have little knowledge in that area or did not invest
    > enough to seek for information, and that the content of this field was
    > not widely reviewed; I would not recommand using it for now, as it is
    > too far from being a stable draft; we'll need to review this list, and
    > match it according to the better language<->script mappings in the
    > CLDR, whose content has certainly been reviewed by much more people
    > with different cultures and origins. (Note that even the CLDR extra
    > data lacks information for the language<->script mappings, but this
    > affects languages that are much less known).

    Idle speculation is often an enjoyable pastime, but those of us who have
    worked on this project for the past two years -- including 18 months in
    the Language Tag Registry Update (LTRU) Working Group of the IETF --
    know exactly how it came to be the way it is.

    Informed contributions are welcome from all cultures and origins.
    Please consider contributing to the ietf-languages list I mentioned
    earlier. This is a whole project by itself, but is not relevant to the
    Unicode list and I will not discuss it further here.

    Doug Ewell
    Fullerton, California, USA
    Editor, draft-ietf-ltru-initial

    This archive was generated by hypermail 2.1.5 : Tue Jul 25 2006 - 09:39:27 CDT