Re: Hebrew script in IDN (was Exemplar Characters)

From: Neil Harris (neil@tonal.clara.co.uk)
Date: Sat Nov 19 2005 - 09:31:13 CST

  • Next message: Neil Harris: "Re: Hebrew script in IDN (was Exemplar Characters)"

    Richard Wordingham wrote:
    > Neil Harris wrote:
    >
    >> I think you might meet some opposition to including the following in
    >> IDNs:
    >>
    > ...
    >> ZWNJ and ZWJ (unless Indic experts can make a _very_ good case for
    >> these being used only in contexts where they cause _visible_ and
    >> _unambiguous_ rendering changes)
    > ...
    >
    > Well, that rules out about half the words in Burmese! I suppose
    > there's the work around of replacing the virama - U+1039 U+200C
    > ('VIRAMA' ZWNJ) - by U+1039 U+005F ( 'VIRAMA' LOW LINE) -
    > extremely unnatural for a language that doesn't have spaces between
    > words.
    >
    > Richard.
    >
    >
    >
    >
    Well, that's a problem for IDN in its present form, because Nameprep
    (RFC 3491) uses table B.1 of Stringprep (RFC 3454), which maps ZWNJ to
    nothing.

    ZWNJ also appears to be used for a similar purpose in Bengali. See
    http://www.unicode.org/faq/indic.html#21

     From my perspective, it would seem that ZWNJ should be usable in
    identifiers, if, and only if, it is used in a context where it makes a
    visible difference to the rendered output. This begs some questions:

    * what to do if the rendering engine does not support the script in
    question?
    * how to phrase the rules for acceptable use of ZWNJ in an unambiguous
    way that can be coded as an algorithm?
    * how to get these rules implemented properly in user agents, as
    otherwise ZWNJ is going to be a very simple way of generating vast
    numbers of spoofs, since it renders as nothing in most text rendering
    engines? (remember, IDNs should be saferesist spoofing in all of their
    labels, not just those issued by registrars.)

    -- Neil



    This archive was generated by hypermail 2.1.5 : Sat Nov 19 2005 - 09:33:00 CST