Re: Hebrew script in IDN (was Exemplar Characters)

From: Neil Harris (neil@tonal.clara.co.uk)
Date: Fri Nov 18 2005 - 08:14:28 CST

  • Next message: Andreas Prilop: "Re: ISO 15924: Different Arabic scripts?"

    Mark Davis wrote:
    > It is not that clear-cut. Identifiers by their nature cannot include
    > all words and phrases valid in all languages. For IDN, for example,
    > one can't express the perfectly reasonable English word "can't", or a
    > word like "I.B.M.".
    >
    > I did introduce a proposal in March for considering the status of some
    > word characters, which turned into a discussion into the UTC of
    > whether to add certain items to the identifier definition.
    >
    > http://www.unicode.org/L2/L2005/05083-wordprops.txt
    >
    > (I'll copy that section here for those without access:
    >
    > 0027 ; # Po APOSTROPHE
    > 002D ; # Pd HYPHEN-MINUS
    > 002E ; # Po FULL STOP
    > 003A ; # Po COLON
    > 00B7 ; # Po MIDDLE DOT
    > 058A ; # Pd ARMENIAN HYPHEN
    > 05F3 ; # Po HEBREW PUNCTUATION GERESH
    > 05F4 ; # Po HEBREW PUNCTUATION GERSHAYIM
    > 200C ; # Cf ZERO WIDTH NON-JOINER // for Indic?
    > 200D ; # Cf ZERO WIDTH JOINER // for Indic?
    > 2010 ; # HYPHEN
    > 2019 ; # Pf RIGHT SINGLE QUOTATION MARK
    > 2027 ; # Po HYPHENATION POINT
    > 30A0 ; # Pd KATAKANA-HIRAGANA DOUBLE HYPHEN
    >
    >
    > The UTC decided that against adding them to the identifier definition.
    > If we were to change that for the Hebrew punctuation, we would have to
    > see a documented case for it.
    >
    > Mark
    >

    Mark,

    I think you might meet some opposition to including the following in IDNs:

    APOSTROPHE (?protocol character)
    FULL STOP (it's a label separator: so no chance for use in IDN labels)
    COLON (a definite protocol character in URLs)
    ZWNJ and ZWJ (unless Indic experts can make a _very_ good case for these
    being used only in contexts where they cause _visible_ and _unambiguous_
    rendering changes)
    RIGHT SINGLE QUOTATION MARK (spoof of APOSTROPHE)
    HYPHENATION POINT (spoof of MIDDLE DOT)
    KATAKANA-HIRAGANA DOUBLE HYPHEN (spoof of EQUALS SIGN, ?protocol character)

    which leaves only

    00B7 ; # Po MIDDLE DOT
    058A ; # Pd ARMENIAN HYPHEN
    05F3 ; # Po HEBREW PUNCTUATION GERESH
    05F4 ; # Po HEBREW PUNCTUATION GERSHAYIM

    as characters which I would consider possible uncontroversial candidates
    for IDN.

    -- Neil



    This archive was generated by hypermail 2.1.5 : Fri Nov 18 2005 - 08:19:58 CST