Re: ZWNJ in IDN (Burmese Issues)

From: Javier SOLA (lists@khmeros.info)
Date: Sun Nov 27 2005 - 19:16:24 CST

  • Next message: Christopher JS Vance: "Re: Representing Unix filenames in Unicode"

    >>> How appropriate would ZWSP be in the middle of words like 'Myanma(r)'
    >>> and 'Yangon'?
    >>
    ZWSP indicates a breaking opportunity. This would be innapropriate if
    the word should not be broken at the end of a line,
    as in Myan
    mar.
    (which is probably the case).

    I am not an expert in Myanmar (even if I am trying to make it render in
    ICU). I would tend to see ZWNJ and ZWJ as part of a cluster, and not as
    word separators. A ZWNJ could be the last character of a cluster... and
    this signals that the cluster is finished... but it is not a word
    separator. A ZWNJ at the end of the first cluster of a two-cluster word
    would not be a separator (if the word should not be divided).

    ZWNJ is an element used in the standard order of components; ZWSP could
    never be.

    I would assume that two different renderings (with and without ZWNJ)
    would lead to different IDNs. IDNs are first expanded (character by
    character) and then compared byte-by-byte. and this would lead to not
    matching two strings if one of them has an extra character (the ZWNJ). I
    do not think that the BIND program used for DNS resolution can do any
    type of normalisation... and I agree that - as it is contemplated in the
    standard order of components - ZWNJ should be usable in IDNs.

    In Khmer this would be more problematic, as the ZWNJ is mostly used to
    break font ligatures (such as LETTER UO + VOWEL I in moul style fonts),
    but the word is exactly the same.

    Javier



    This archive was generated by hypermail 2.1.5 : Sun Nov 27 2005 - 19:16:21 CST