11.3 Myanmar

Myanmar: U+1000–U+109F

The Myanmar script is used to write Burmese, the majority language of Myanmar (formerly called Burma). Variations and extensions of the script are used to write other languages of the region, such as Shan and Mon,, Mon, S'gaw Karen, Western and Eastern Pwo Karen, Geba Karen, Kayah, Shan, and Rumai Palaung, as well as Pali and Sanskrit. The Myanmar script was formerly known as the Burmese script, but the term “Myanmar” is now preferred.

The Myanmar writing system derives from a Brahmi-related script borrowed from South India in about the eighth century to write the Mon language. The first inscription in the Myanmar script dates from the eleventh century and uses an alphabet almost identical to that of the Mon inscriptions. Aside from rounding of the originally square characters, this script has remained largely unchanged to the present. It is said that the rounder forms were developed to permit writing on palm leaves without tearing the writing surface of the leaf.

Because of its Brahmi origins, the Myanmar script shares the structural features of its Indic relatives: consonant symbols include an inherent “a” vowel; various signs are attached to a consonant to indicate a different vowel; ligatures and conjuncts are used to indicate consonant clusters; and the overall writing direction is from left to right. Thus, despite great differences in appearance and detail, the Myanmar script follows the same basic principles as, for example, Devanagari.

Standards. There is not yet an official national standard for the encoding of Myanmar/Burmese. The current encoding was prepared with the consultation of experts from the Myanmar Information Technology Standardization Committee (MITSC) in Yangon (Rangoon). The MITSC, formed by the government in 1997, consists of experts from the Myanmar Computer Scientists’ Association, Myanmar Language Commission, and Myanmar Historical Commission.

Encoding Principles. As with Indic scripts, the Myanmar encoding represents only the basic underlying characters; multiple glyphs and rendering transformations are required to assemble the final visual form for each syllable. Even some single characters, such as U+102C ◌ာ MYANMAR VOWEL SIGN AA may assume variant forms (for example, [[tall aa]]) depending on the other characters with which they combine. Conversely, characters and combinations that may appear visually identical in some fonts, such as U+101D ဝ MYANMAR LETTER WA and U+1040 ၀ MYANMAR DIGIT ZERO, are distinguished by their underlying encoding.

Composite Characters. As is the case in many other scripts, some Myanmar letters or signs may be analyzed as composites of two or more other characters and are not encoded separately. The following are examples of Myanmar letters represented by combining character sequences

myanmar vowel sign o
U+1000 က ka + U+1031 ◌ေ vowel sign e + U+102C ◌ာ vowel sign aa → ကော

myanmar vowel sign au
U+1000 က ka + U+1031 ◌ေ vowel sign e + U+102C ◌ာ vowel sign aa + U+1039 ◌္‌ virama + U+200C [[ZWNJ]] → ကော္‌ kau

myanmar vowel sign au
U+1000 က ka + U+1031 ◌ေ vowel sign e + U+102C ◌ာ vowel sign aa + U+103A ◌္‌ asat → ကော္‌ kau

U+1001 ခ  kha + U+1031 ‌ေ vowel sign e + U+102B ါ  vowel sign tall aa + U+103A်  asat → ခေါ် kau

myanmar vowel sign ui

myanmar vowel sign o
U+1000 က ka + U+102F ◌ု vowel sign u + U+102D ◌ိ vowel sign i → ကုိ kui

U+1000 က ka + U+102D ◌ိ vowel sign i + U+102F ◌ု vowel sign u → ကုိ kui

Myanmar vowel sign oun

U+1000 က consonant letter ka + U+102F ု vowel sign u + U+1036 ံ Myanmar sign Anusavara → ကုံ (koun)

Myanmar vowel sign ein

U+1010 တ consonant letter ta + U+102D ိ vowel sign i + U+1036 ံ Myanmar sign Anusavara → တိံ (tein)

Encoding Subranges. The basic consonants, independent vowels, and dependent vowel signs required for writing the Myanmar language are encoded at the beginning of the Myanmar range. Extensions of each of these categories for use in writing other languages, such as Pali and Sanskrit, are appended at the end of the range. In between these two sets lie the script-specific signs, punctuation, and digits.

Conjuncts and Medial Consonants. As in other Indic-derived scripts, conjunction of two consonant letters is indicated by the insertion of a virama U+1039 ◌္ MYANMAR SIGN VIRAMA between them. It causes ligation or other rendered combination of the consonants, although the virama itself is not rendered visibly. It causes the second consonant to be displayed below the first, in a smaller form, and the virama itself is not rendered visibly.

Kinzi. The conjunct form of U+1004 င MYANMAR LETTER NGA is rendered as a superscript sign called kinzi. Kinzi is encoded in logical order as a conjunct consonant before the syllable to which it applies; this is similar to the treatment of the Devanagari ra. (See Section 9.1, Devanagari, rule R2.) For example, kinzi applied to U+1000 က MYANMAR LETTER KA would be written via the following sequence:

U+1004 င nga + U+1039 [[VIRAMA]] virama + U+1000 က ka → င္က vka

U+1004 င nga + U+103A [[VIRAMA]] asat + U+1039 [[VIRAMA]] virama + U+1000 က ka → င္က vka

Medial Consonants. The Myanmar script traditionally distinguishes a set of subscript “medial” consonants: forms of ya, ra, wa, and ha that are considered to be modifiers of the syllable’s vowel. Graphically, these medial consonants are sometimes written as subscripts, but sometimes, as in the case of ra, they surround the base consonant instead. In the Myanmar encoding, the medial consonants are treated as conjuncts; that is, they are coded using the virama are encoded separately. For example, the word krwe က္ရ္ဝေ, [kjwei] (“to drop off ”) would be written via the following sequence:

U+1000 က ka + U+1039 [[VIRAMA]] virama + U+101B ရ ra + U+1039 [[VIRAMA]] virama + U+101D ဝ wa + U+1031  ေ vowel sign e → က္ရ္ဝေ krwe

U+1000 က ka + U+103C ရ medial ra + U+103D ဝ medial wa + U+1031  ေ vowel sign e → က္ရ္ဝေ krwe

In Pali and Sanskrit texts written in the Myanmar script, as well as in older orthographies of Burmese, the consonants ya, ra, wa, and ha are sometimes rendered in subjoined form. For those cases, U+1039 ◌္ MYANMAR SIGN VIRAMA and the regular form of the consonant are used:

U+1000 က ka + U+1039 [[VIRAMA]] virama + U+101B ရ ra → [[ka with subjoined ra]] kra

Explicit Virama. The virama U+1039 ◌္ MYANMAR SIGN VIRAMA also participates in some common constructions where it appears as a visible sign, commonly termed killer. In this usage where it appears as a visible diacritic, U+1039 is followed by a U+200C ZERO WIDTH NON-JOINER, as with Devanagari (see Figure 9-3).

Asat. The killer sign asat often appears as a visible sign, either to indicate that a vowel sound is suppressed, or to Regardless of its use, this visual sign is always represented by the character U+103A MYAMAR SIGN ASAT.

Contractions. In a few words, the repetition of a consonant sound is written with a single occurrence of the letter for the consonant sound togheter with an asat sign. This asat sign is placed immediately after this double-acting consonant:

U+101A ya + U+1031 vowel sign e + U+102C vowel sign aa + U+1000 ka + U+103A asat + U+103B medial ya + U+102C vowel sign aa + U+1038 visarga → [[]] man, husband

U+1000 ka + U+103B medial ya + U+103D medial wa + U+1014 na + U+103A asat + U+102F vowel sign u + U+1015 pa + U+103A asat → [[]] I (first person singular)

Great sa. The great sa [[]] is encoded as U+103F MYANMAR LETTER GREAT SA. While this letter can be viewed as a conjunct form of two sa, it should be represented with <U+103F>, while the sequence <U+101E, U+1039, U+101E> should be used for the regular conjunct form of two sa.

Tall aa. While the alternance between [[aa sign]] and [[tall aa sign]] is predictable in writing the Burmese language and both signs are used to write the same sound /aa/, the S'gaw Karen orthography does not use this alternance. For this reason, two characters are encoded: U+102B [[]] MYANMAR VOWEL SIGN TALL AA and U+102C ◌ာ MYANMAR VOWEL SIGN AA. Both characters are to be used in Burmese texts.

Ordering of Syllable Components. Dependent vowels and other signs are encoded after the consonant to which they apply, except for kinzi, which precedes the consonant. Characters occur in the relative order shown in Table 11-3.

Table 11-3. Myanmar Syllabic Structure

Name Encoding Example
kinzi <U+1004, U+1039><U+1004, U+103A, U+1039>
consonant and vowel letters [U+1000..U+1021][U+1000..U+102A, U+103F, U+104E]
asat sign (for contractions) <U+103A>
subscript consonant <U+1039, [U+1000..U+1019, U+101C, U+101E, U+1020, U+1021]>
medial ya <U+1039, U+101A><U+103B>
medial ra <U+1039, U+101B><U+103C>
medial wa <U+1039, U+101D><U+103D>
medial ha <U+1039, U+101F><U+103E>
vowel sign e U+1031
vowel sign u, uu [U+102F, U+1030]
vowel sign i, ii, ai [U+102D, U+102E, U+1032]
vowel sign u, uu [U+102F, U+1030]
vowel sign tall aa, aa [U+102B, U+102C]
anusvara U+1036
atha (killer) <U+1039, U+200C>
asat sign <U+103A>
dot below U+1037
visarga U+1038

U+1031 ◌ေ MYANMAR VOWEL SIGN E is encoded after its consonant (as in the earlier example), although in visual presentation its glyph appears before (to the left of) the consonant form.

Spacing. Myanmar does not use any whitespace between words. If word boundary indications are desired—for example, for the use of automatic line layout algorithms—the character U+200B ZERO WIDTH SPACE should be used to place invisible marks for such breaks. The zero width space can grow to have a visible width when justified.