Re: Myanmar script, Pali language and other unencoded conjuncts or punctuations

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Jan 04 2005 - 10:44:54 CST

  • Next message: Philippe Verdy: "Re: ISO 10646 compliance and EU law"

    ----- Original Message -----
    From: "Edward H. Trager" <ehtrager@umich.edu>
    To: <unicode@unicode.org>
    Sent: Monday, January 03, 2005 4:42 PM
    Subject: Re: Myanmar script, Pali language and other unencoded conjuncts or
    punctuations

    >
    > On Monday 2005.01.03 09:13:59 +0100, Antoine Leca wrote:
    >> On Friday, December 31st, 2004 11:52Z, Philippe Verdy va escriure:
    >> >
    >> > Are there sources available for an exhaustive list of contextual
    >> > forms and conjuncts used in the Myanmar script?
    >>
    >> You should know that "exhaustive lists" are kind of impossible with
    >> virama-based scripts.
    >>
    >>
    >> > I can easily find fonts for the Myanmar/Pali script, (none of them
    >> > mapped to Unicode),
    >>
    >> Look after MyaZedi (http://www.myazedi.com/downloads/).

    The Myazedi website is now... empty: a page with graphics and no active
    links...
    This Myazedi font is found on other sites. But this font is too much
    defective. And Pali characters are not present in it!

    What is more interesting, and that I'm looking for, is a report abound the
    encoding of various conjuncts used in Myanmar script (and its extensions). I
    know that some new characters are in the Unicode character pipe. But the
    list of conjuncts is documented nowhere.

    On SIL, I saw the listing of some of the conjuncts, but not as a descriptive
    document but as a Graphite source. This is tricky to work with, and still it
    does not show the conjuncts. To make a document out of this, I would have to
    decript the source, with the SIL font to see the glyphs with which they are
    mapped.

    I'm less interested in the fonts themselves than in the list of conjuncts
    for linguistic analysis (notably for plain text searches). Some other
    missing info: the collation rules, and tips about how the script is used in
    various languages (not only the Burmese language) and in several countries
    (not only in Myanmar).

    And if there are new characters proposed, which they are (are there new
    punctuations such as those that I have read in some sources? Are there
    distinctive variants or additions, notably in dependant vowels), and how
    will they work in the Myanmarese grapheme clusters model.

    Also the rules related to the usage of "kinzi" (which is a *logically
    combining* character, encoded with a *base* consonnant and a modifier
    *before* the default grapheme cluster which starts with another *base*
    consonnant) notably for its disambiguation in relation with Unicode joiners.
    And how this feature is to be handled in input methods or keyboard drivers,
    or how this impacts the general encoding of other "normal" clusters (with or
    without the kinzi). If there are some other similar variants of kinzi in
    other Myanmar-scrip-based languages, it would be interesting to know that
    early, because this impacts a lot the way Myanmar-encoded texts will be
    parsed, rendered, or indexed for plain-text searches... Unicode just says
    for now that this "kinzi" behavior is similar to the behavior of the
    Devanagari RA, but my experience with it shows that it is much more tricky
    to handle.

    The complexity of the Myanmar script is not enough documented by Unicode,
    and unfortunately, the relevant and accurante resources about it are quite
    hard to find on the web (there may exist sources in local libraries, but I
    can't get them, and free communications with MyanMar, the country, are too
    severely controled by its government, so local contacts are impossible to
    find; if there exists some resources elsewhere, written by migrant Burmese
    people in English or French, I would be happy to find them...)



    This archive was generated by hypermail 2.1.5 : Tue Jan 04 2005 - 10:59:56 CST