Re: Questions on Myanmar encoding

From: Maung TunTunLwin (maungttlwin@myanmar.com.mm)
Date: Fri Sep 19 2003 - 21:15:28 EDT

  • Next message: Anto'nio Martins-Tuva'lkin: "Re: IE5 renders U+2011 as it pleases"

    Hello Mr. Eric Muller,

    I am new comer to Myanmar Unicode encoding. Although I don't satisfy enough
    on current encoding but I will try to answer your question.

    > 1. What is encoded by the sequence of characters
    >
    > U+1004 င MYANMAR LETTER NGA
    > U+1039 ◌္ MYANMAR SIGN VIRAMA
    > U+1004 င MYANMAR LETTER NGA
    >
    > is it kinzi + consonant NGA or consonant NGA+ subscript consonant NGA?
    > Should we add some words to Table 10.3 to clarify that?

    It is always Kingzi and always true for Myanmar and Parli scripts. Character
    Nga can not stay as subscript absolutely. Although, there is no word like
    Nga over Nga it is absolutely legal to think and there will be one way to
    think as Kingsi over Nga.

    > 2. Does consonant + subscript consonant NGA ever appear? If so, how is
    > it rendered? If not, should we remove U+1004 from the third row of Table
    > 10.3?

    There will be no Nga as subscript consonant. If there have a chance to
    render I think the best thing to show is illegal character, mean rectangle
    box or similar. Or if you wish you can show as small Nga below and let user
    decide what he does is wrong or not. That is developer's choice and I think
    absolutely legal.
    I don't know what you mean third row of Table 10.3. As I said I am new comer
    please tell me where to look.

    > 3. About Table 10.3: it is true that *in the encoding model* a cluster
    > is always made of one element of each row, with row 2 (consonant)
    > mandatory and the other rows optional?

    I don't know exactly what it mean but I will explain our Pasint rule, may be
    it will help you.

    On our Pasint rule there can be two consonant form as cons+subscript. One
    thing to understand is upper character is not truly Consonant of current
    word. I use here word as single sounded character group. It is killer of
    front word. Of cause on sorting it will go with front word and subscript
    character will become true Consonant of current word.

    So is that mean if there is no word concerting upper character is killer of
    front(optional) word and lower character is Consonant(mandatory).

    I think that will make big problem for you. If you still have some
    difficulty try on me, I will try my best to explain.

    > 4. Is that model realistic, or are there some exceptions, that is real
    > life situations that it does not capture? Of cases where the encoding is
    > possible, but not intuitive (e.g. two clusters in the encoding instead
    > of one)?

    I agree that model still have some exceptions left. Especially on old Parli
    usage and some word concerting. Without that I will say this model good for
    glyph display but not good enough for user friendly and sorting.

    > 5. Is is "correct" to view the kinzi as a medial form of NGA, which just
    > happens to be encoded at the front of the cluster? For what values of
    > "correct"?

    No, absolutely wrong, as I explain Kingsi is killer of front word, and also
    true for other character that have subscript.

    > 6. Finally, I have tried to encode various strings I have seen in print
    > (or rather as pictures of printed stuff). I would really appreciate if
    > somebody could check my encodings. By the way, I found the introduction
    > to the Burmese script on that site very interesting. In particular, not
    > having to consider encoding made the presentation more accessible (i.e.
    > it provides the level of expertise needed to understand the "Composite
    > Characters" subhead in section 10.3).

    With current model..
    1018 102C 1039 101B 1031 102C 1010 102C 101C 1032 0020 1001 1004 1039 200C
    1017 1039 101A 102C 1038 => "What you said?"

    correct. But it used Space(0020), current rules said to use ZWSP. I also
    agree to use Space because ZWSP don't show visually break especially when
    not aligning. We need visually break to prevent miss reading.

    1019 102C 1010 102D 1000 102C=>"Contents"

    correct.

    1021 1013 1031 101B 102D 1000 1014 1039 200C 1012 1031 102C 1039 200C 101C
    102C 0020 1042 1048 0020 1042 002C 1040 1040 1040 0020 1000 1030 100A 102E=>
    "US$28 2,00 ...?" I think help? 1000 1030 100A 102E 1015 102C

    Just one character wrong 1031on third place should be 1012. And there should
    be no space between 18 2,00.

    1021 1039 101D 1014 1039 200C 101C 102F 102D 1004 1039 200C 1038 1017 102E
    1007 102C 101C 1039 101A 1039 101F 1031 102C 1000 1039 200C 1014 102F 102D
    1004 1039 200C=> "Can apply VISA with on line"

    correct.

    1010 102D 101B 1005 1039 1006 102C 1014 1039 200C 1025 101A 1039 101A 102C
    1025 1039 200C 1018 102F 102D 1037 0020 2018 1015 1004 1039 200C 1012 102C
    1014 102E 2019 0020 101B 1031 102C 1000 1039 200C => " 'PandaNi' for zoo..."

    I don't know what red Panda mean but flow is correct just one big mistake
    there
    is no 1025 1039 200C LetterU Killer. The characters after 1021 to 102A can
    not use as character of Killer.

    It is 1009 1039 200C. Character 1009 NYA have two glyph. What you see on
    Unicode is normal form glyph. Another form glyph is similar, you can say
    same, with 1025 U
    used only to pressed killer and Character, that have subscript, also another
    kind of killer.

    Nice to see you and I'm also wish to change some encoding rules but every
    body said TOO LATE.

    Bye...
    Maung TunTunLwin
    maungttlwin@myanmar.com.mm



    This archive was generated by hypermail 2.1.5 : Fri Sep 19 2003 - 22:15:28 EDT