Re: CLDR: Bad exemplar chars for some locales [ar,fa]

From: Peter Edberg (pedberg@apple.com)
Date: Thu Apr 06 2006 - 21:43:16 CST

  • Next message: Otto Stolz: "Re: Decomposed vs Composed accented characters"

    Further investigation reveals that according to the restrictions for
    main (standard) and auxiliary exemplar sets in
    <http://unicode.org/cldr/data_formats.html>,
    characters with a property of Default_Ignorable_Code_Point - which
    includes 200C ZWNJ and 200D ZWJ - are not allowed in the main
    exemplar set, period. They are only allowed in the auxiliary exemplar
    set.

    -Peter E

    On Apr 6, 2006, at 9:58 AM, Peter Edberg wrote:

    > OK, it sounds like ZWNJ "is necessary for writing Persian where
    > certain afÞxes and compound words do not join" (per the first
    > website below).
    >
    > The need for ZWJ in Persian seems more specialized, I am not sure
    > it demonstrates that ZWJ is required for most uses of Persian.
    >
    > The Arabic examples for ZWJ and ZWNJ seem to be all for special
    > display cases (and partly to work around browser-specific display
    > issues), and again I would argue that these are not cases that
    > should be covered by the standard exemplar set. Perhaps for Arabic,
    > ZWJ and ZWNJ should be in an auxiliary characters set.
    >
    > The non-Arabic and non-Persian language examples (e.g. for Urdu and
    > Sindhi) do not apply to the "ar" or ""fa" locales.
    >
    > All of this hinges on the definition of what the exemplar set is
    > supposed to cover. From UTS #35 (LDML): "The exemplar character set
    > contains the commonly used letters for a given modern form of a
    > language... It is not a complete set of letters used for a
    > language, nor should it be considered to apply to multiple
    > languages in a particular country. Punctuation and other symbols
    > should not be included. In general, the test to see whether or not
    > a letter belongs in the set is based on whether it is acceptable in
    > that language to always use spellings that avoid that character."
    >
    > -Peter E
    >
    > On Apr 6, 2006, at 12:41 AM PDT, Andreas Prilop wrote:
    >> They are! See
    >> http://www.unics.uni-hannover.de/nhtcapri/bidirectional-
    >> text.html#zwnj
    >> http://www.laits.utexas.edu/persian/persianword/zwnj.htm
    >>
    >> http://www.unics.uni-hannover.de/nhtcapri/bidirectional-text.html#zwj
    >> http://www.laits.utexas.edu/persian/persianword/zwj.htm
    >
    > On Apr 5, 2006, at 5:40 PM PDT, Asmus Freytag wrote:
    >> I believe that extensive discussion on the bidi list has
    >> established that ZWNJ is indeed needed for Persian.
    >
    > On Apr 5, 2006, at 5:25 PM PDT, Michael Everson wrote (somewhat
    > cryptically):
    >> Please consult our Iranian colleagues on this question.
    >
    > On Apr 5, 2006, at 5:03 PM PDT, Peter Edberg wrote:
    >> 1. Arabic (ar) & Persian (fa):
    >> - Both of these include 200C and 200D (ZWNJ and ZWJ). I would argue
    >> that these characters are not required in order to write Arabic or
    >> Persian.
    >>
    >
    >



    This archive was generated by hypermail 2.1.5 : Thu Apr 06 2006 - 21:57:17 CST