Re: Myanmar Ordering of Syllable Components vs. canonical order

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Fri, 6 Sep 2013 23:33:37 +0100

On Thu, 5 Sep 2013, in two posts,
Markus Scherer <markus.icu_at_gmail.com> wrote:

> Unicode 6.2 chapter
> 11<http://www.unicode.org/versions/Unicode6.2.0/ch11.pdf>.3
> Myanmar, Table 11-3. Myanmar Syllabic Structure, shows that 103A
> asat sign comes before 1037 dot below. However, 1037 has ccc=7
> which comes before (in canonical order) 103A which has ccc=9.

> Is it correct that Unicode normalization of Myanmar text moves
> characters out of the order in table 11-3?
> If so, should there be a note about this in the text? (Sorry if I
> just missed it.)

I would say so.

> I also just found Unicode Technical Note
> #11<http://www.unicode.org/notes/tn11/UTN11_4.pdf>which describes the
> "Unicode 5.1 Model", with a similar table on pp. 6 & 7,
> but that table is both more detailed and apparently partially in
> conflict with what's in Unicode 6.2. Still, the first occurrence of
> 103A comes before 1037.

> Is UTN #11 out of date?

It's truer to say that the Unicode standard is out of date. TUS ought
to define the correct orders, rather than leaving one to guess for
Myanmar script languages other than Burmese. By correct, I mean that
anything in a canonically inequivalent form is liable to be
misinterpreted. In some cases, the simplest exposition will yields
sequences that are not in canonical form; I think 'Keep it simple,
stupid' should be the guiding principle for comprehensive definitions.

The order <ASAT, DOT BELOW> for Burmese appears to depend on what can
be inserted between them in other languages.

Richard.
Received on Fri Sep 06 2013 - 17:36:15 CDT

This archive was generated by hypermail 2.2.0 : Fri Sep 06 2013 - 17:36:16 CDT