Fwd: ZWNJ in IDN (Burmese Issues)

From: Ngwe Tun (ngwestar@gmail.com)
Date: Sun Nov 20 2005 - 18:02:20 CST

  • Next message: Mark E. Shoulson: "Re: Hebrew script in IDN"

    Dear Richard & Groups

    You said so there is not an issue. I don't agree that.

    Windows XP Service Pack 2 or VOLT User Community's updated uniscribe engine
    (usp10.dll) supports both of khmer and burmese. So, We tried it with burmese
    language. But It's not perfect yet for burmese.

    1) I agree that virama ZWNJ consonant was distinct virama consonant. But We
    have some issues cases as follows;
    a) in one syllable, consonant consonant virama ZWNJ various sign DOT BELOW
    (1001 1014 1039 200C 1037)
    b) in one syllable, consonant consonant virama ZWNJ various sign VISARGA
    (1000 1014 1039 200C 1038)
    There is a problem adding ZWNJ between visible virama and various sign. I've
    tried to get one syllable these sequence but ZWNJ break the syllable before
    various sign. It should not be one syllable for various sign.

    2) Another issues is kinzi problem
    a) In Unicode 4.0, Chapter 10, kinzi assign as nga virama (1004 1039)
    b) and also medial ya, ra, wa, ha assign as virama [ya ra wa ha] (1039 [101A
    101B 101D 101F])

    So We got problem in combination of nga and medial. While combination of nga
    (1004) and medial wa (1039 101D), it may appear/render wa(101D) + kinzi(1004

    I guess so it should be add ZWJ after virama, It might be more safe for
    collision or selecting ambiguity. I'm right.

    I would like to get responses in these issues.


    Ngwe Tun
    On 11/20/05, Richard Wordingham <richard.wordingham@ntlworld.com> wrote:
    > Neil Harris wrote:
    > > Richard Wordingham wrote:
    > >> Neil Harris wrote:
    > >>
    > >>> I think you might meet some opposition to including the following in
    > >>> IDNs:
    > >>> ZWNJ and ZWJ (unless Indic experts can make a _very_ good case for
    > these
    > >>> being used only in contexts where they cause _visible_ and
    > _unambiguous_
    > >>> rendering changes)
    > \
    > >> Well, that rules out about half the words in Burmese! I suppose there's
    > >> the work around of replacing the virama - U+1039 U+200C ('VIRAMA' ZWNJ)
    > -
    > >> by U+1039 U+005F ( 'VIRAMA' LOW LINE) - extremely unnatural for a
    > >> language that doesn't have spaces between words.
    > > Well, that's a problem for IDN in its present form, because Nameprep
    > (RFC
    > > 3491) uses table B.1 of Stringprep (RFC 3454), which maps ZWNJ to
    > nothing.
    > At what point does the ZWNJ disappear? If it remains in what is entered
    > and
    > displayed by the user, but is ignored when comparing names, then there is
    > no
    > problem.
    > > ZWNJ also appears to be used for a similar purpose in Bengali. See
    > > http://www.unicode.org/faq/indic.html#21
    > >
    > > From my perspective, it would seem that ZWNJ should be usable in
    > > identifiers, if, and only if, it is used in a context where it makes a
    > > visible difference to the rendered output. This begs some questions:
    > >
    > > * what to do if the rendering engine does not support the script in
    > > question?
    > Probably not an issue. The Uniscribe that comes with Windows XP supports
    > neither Burmese not Khmer, but I can still interpret what it produces. A
    > more significant issue is the lack of font support - Uniscribe supports
    > kana, but I don't have a font for the Katakana Phonetic Extensions. In
    > this
    > instance, can we be sure that font mixing will not be a problem? With my
    > mix of fonts, underdotted Latin letters often come from a font with a
    > larger
    > x-height than the normal letters.
    > > * how to phrase the rules for acceptable use of ZWNJ in an unambiguous
    > way
    > > that can be coded as an algorithm?
    > Some cases may just have to be unsupported. What stops Unicode viramas
    > spoofing one another? If you require that viramas be consistent with the
    > script-specific characters on either sides, then you may allow certain
    > combinations of virama, ZWNJ and consonants of a specific script.
    > Devanagari virama + ZWNJ may be too unsafe to allow as distinct from plain
    > virama, while Burmese virama + ZWNJ + consonant is always distinct from
    > just
    > virama + consonant.
    > Richard.

    This archive was generated by hypermail 2.1.5 : Sun Nov 20 2005 - 18:03:56 CST