Re: IDNA2008 Contextual rules clarification

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Oct 29 2010 - 16:36:09 CDT

  • Next message: JP Blankert (thuis & PC based): "First posting to list: Unicode.org: unicode - punycode converter tool?"

    Nagesh Chigurupati asked:

    > I have a question regarding some of the contextual rules in RFC5892. For
    > example the contextual rule in appendix A.4 Greek Lower Numeral Sign
    > (U+0375), states the following:
    >
    > If Script(After(cp)) .eq. Greek Then True;
    >
    > If the Greek Lower Numeral Sign (U+0375) is the last code point in the
    > IDN, should it be allowed? There are statements in the RFC5892 as
    > follows:
    >
    > Before(FirstChar) evaluates to Undefined.
    > After(LastChar) evaluates to Undefined.
    >
    > Can I assume that "Undefined" is not equal to "Greek", and therefore
    > input sequences with a trailing Greek Lower Numeral Sign are always
    > disallowed by the specification?

    Correct.
     
    > The Hebrew Punctuation Geresh (U+05F3), Hebrew Puncutation Gershayim
    > (U+05F4), etc. also pose a similar question. The rule set for these
    > contextual rules states the following:
    >
    > If Script(Before(cp)) .eq. Hebrew Then True;
    >
    > So, if the first code point is U+05F3, then should it be disallowed

    Correct.

    > as
    > there is no code point before this one to assert that it belongs to the
    > Hebrew script.

    Although the reasoning there is incorrect. The script of
    U+05F3 and U+05F4 is Hebrew already. It isn't a matter of a lack
    of a previous character to assert this. Rather, the RFC 5892
    specification simply states that U+05F3 and U+05F4
    are only allowed immediately following a(nother) Hebrew character
    in a label.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri Oct 29 2010 - 16:38:44 CDT