[Unicode]  Frequently Asked Questions Home | Site Map | Search

Myanmar Scripts and Languages

Q: How is the Myanmar script encoded?

A: The Myanmar script was added to the Unicode Standard in Version 3.0 (September, 1999). Version 5.2 significantly extended the script in 2009. Unicode now has three blocks for characters of this script:

These code points support several languages written in the script, including Myanmar (Burmese), Pali, Sanskrit, Shan, Mon, Karen, Kayah, and others.

Code points include letters (consonants and independent vowels), vowel signs, medial signs, digits, various signs, and punctuation. Medial and vowel signs, anusvara, visarga, virama, asat and others combine with letters.

Q: What other encodings are commonly used for the Myanmar language?

A: There are several ad hoc font encodings in common use, all needing specific fonts to render text. ZawgyiOne, Zawgyi 2008, and Myazedi are most commonly used.

Q: What are the differences between Unicode and the ad hoc encodings, such as Zawgyi?

A: Unicode's Myanmar script provides:

  • Compatibility across platforms, operating systems, and programming languages
  • Unique code points for each consonant, vowel, and modifier, regardless of visual appearance
  • Efficient use of code space
  • The ability to support all languages that can be written with the script
  • A unique ordering of code points comprising a Myanmar syllable (consonants, vowels, and so on), where vowels always follow the consonant
  • Consistent implementation of text comparison, search, and other language processing
  • Font-independent representation, allowing rendering with any Unicode-compliant font installed on a device

The ad hoc font encodings such as Zawgyi have many serious problems:

  • No compatibility across platforms, operating systems, or programming languages
  • Incompatible with Unicode, the widely supported international standard
  • Use of multiple code points for characters and combined renderings, leading to interchange chaos
  • Inefficient use of the code range, requiring twice as many code points to represent only a subset of the script
  • No support for all the languages used in Myanmar, making it impossible to show text in languages using this script other than Myanmar
  • Vowel code points may appear before or after a consonant. This results in different representations for each visual rendering, leading to search and comparison problems.
  • Inconsistent text comparison, searching, and other language processing, often within a single document.
  • Lack of font support. Because the appearance of a syllable depends on the specific code points selected, text in these ad hoc encodings such as Zawgyi can only be rendered if the specific font is installed on the target device.
  • No support in standard software offerings

Q: What are some of the visible differences between Unicode and the ad hoc encodings such as Zawgyi?

A: Wikipedia:Font shows code point differences between Unicode and Zawgyi, the most commonly used ad hoc scheme.

For each combining character, the Unicode Standard defines a single code point that is rendered appropriately for the base character. For example U+103C, the ra medial surrounds an associated consonant with a line. A Unicode font generates the right shape at display time.

Non-Unicode fonts define as many as 8 code points for different parts of the same ra glyph. Typing is cumbersome because the user must select the right form for each context.

An incorrect match between font and text shows "dotted" characters or overlapping lines, and also incorrect characters, as shown in the following table.

Encoding With Unicode (Padauk) font With ZawgyiOne font Code points
Unicode text Unicode text rendered correctly with Unicode font Unicode text showing rendering errors with Zawgyi font U+1015 U+103C U+102F U+101C U+102F U+1015 U+103A U+1019 U+103E U+102C
Zawgyi-encoded text Zawgyi-encoded data rendered incorrectly with Unicode font Padauk Zawgyi-encoded text rendered with Zawgyi font 0x1018 0x101a 0x1039

Unicode text can be displayed using any Unicode-compliant font. However, non-Unicode text can only be displayed with its encoded font.

Unicode also defines a unique order of code points for base letters and combining characters.

Q: Isn't there a universal font that will display Unicode and Zawgyi text together?

A: No. Since the code points for Zawgyi and Unicode use the same range (0x1000-0x109f), no font can automatically apply the right character shapes. A universal font is impossible.

Zawgyi should be converted to Unicode before adding to a web page or other display.

If absolutely necessary, HTML can explicitly specify a non-Unicode font for a tagged region if the encoding of the text is non-Unicode.

Q: Doesn't "UTF-8" indicate Unicode?

A: Yes, if properly used with Unicode code points. "UTF-8" technically does not apply to ad hoc font encodings such as Zawgyi.

Q: How can I tell what encoding is used for a particular website or piece of text?

A: Almost all text in a given encoding will render correctly only when displayed with a compatible font. For example, Zawgyi text will appear incorrectly with a Unicode font, and text encoded as Unicode will look wrong with the ZawgyiOne font. However, some strings look identical in both encodings because all these fonts have a common subset of characters.

Some online tools are available that will help determine the encoding of text.

The Unicode Consortium does not guarantee that these tools are accurate or complete, however.

Q: Isn't Unicode just another font for Myanmar?

A: No, Unicode is neither a font nor a font encoding. It defines character code points and also requires a specific ordering of code points for consistent text rendering. Unicode-compliant fonts already expect this ordering to display characters correctly.

As a published standard, Unicode describes each code point, including characteristics used for other text processing functions such as collation, combining status, and combining order. These data are available via the ICU C++ and Java libraries.

Q: Are there recommended Unicode fonts for Myanmar text? Where can I find them?

A: Many Unicode-compatible fonts are available for Mynamar text. These can be found with an online search. Important: proper display of characters depends both on the font and the rendering software (engine) used. Some rendering engines do not fully support all Unicode sequences.

Q: What languages can be written with Unicode Myanmar characters?

A: Characters for these languages are supported in the three Unicode Myanmar blocks noted above:

  • Myanmar (Burmese)
  • Mon
  • S'gaw Karen
  • Western Pwo Karen
  • Eastern Pwo Karen
  • Geba Karen
  • Kaya
  • Shan
  • Rumai Palaung
  • Khamti Shan
  • Aiton
  • Phake
  • Pa'o Karen
  • Shwe Palaung
  • Shan Pali
  • Tai Laing

Q: Are there any tools that can help me detect Zawgyi encoded text and convert it to Unicode?

A: Yes. Because some strings are valid in both Zawgyi and Unicode, it is not always possible to achieve 100% accuracy in distinguishing the two. The library Myanmar Tools uses a machine learning model to estimate whether a string is represented in Zawgyi or in Unicode. The tools are available from the Google i18n team in JavaScript, Java (Android), and other programming environments. See: https://github.com/googlei18n/myanmar-tools

Note: Detectors that use hand-coded rules are susceptible to flagging content in other languages like Shan and Mon as Zawgyi when it is actually Unicode, so are not generally recommended.

Q: Is it possible to convert text in other encodings to Unicode?

A: Yes, several converters are publicly available:

The Unicode Consortium does not guarantee the quality of these solutions, however.

Q: Should I support both Unicode and Zawgyi on my site? If so, how do I do that?

A: Because many platforms do not yet have Unicode fonts, it is helpful to provide a way for all users to view content. The preferred technique is to detect the encoding of user-entered text, then convert to Unicode. Display the converted text.

Other options are to use a webfont on your site and apply it with CSS in any HTML block that displays text, for example, a <div class="myfontclass"> tag. The font is loaded along with the text, allowing modern browsers to display text in the loaded font. This works well in most cases for either Zawgyi or Unicode text. However, transmitting the font increases load time for such content.

Another option is to let users switch via a prominent control on the page to select either Unicode or other encoding. Then use this setting to load pages in the selected encoding. An automatic converter may be used to prepare text as needed. This has the advantage of avoiding font download, but adds complexity to both the client and server.

As a final option, don't worry about it. Provide content in only Zawgyi or only in Unicode and let users determine whether to use your site based on the encoding. This limits the usability of your content, of course, because either the Zawgyi or the Unicode content will appear garbled depending on the user's installed font.

Remember that search engines may not understand all text encodings. Unicode text on your site can be consistently interpreted.

Q: My site has content entered by users in both Zawgyi and Unicode text. How can my users read both?

A: It's great that you want your users to be able to read all messages! There are at least two ways to enable this, similar to the methods described above for websites. Each requires detecting the encoding of each message posted.

The preferred method is to convert all postings to Unicode form. Set CSS to use Unicode-compatible fonts.

An alternative method is to use web fonts for the site. Make sure each posting is in its own tagged block such as div. Set the CSS for each post to either a Unicode font or Zawgyi, depending on what was detected for the individual posting. Note that this will result in an inconsistent look to the text due to different font styles.

You may also consider educating your users on using Unicode fonts.

Q: How is Myanmar handled on mobile devices?

A: Most mobile devices do not allow the user to change or replace the installed fonts. An application may "bundle" a font, but that will only be used within the application, not for other tools or apps.

Many devices already include a Unicode-compliant font that is used by default for any Myanmar text. Any Unicode text will appear correctly in the system-installed applications. Zawgyi text will look wrong unless the particular application has included the Zawgyi font within the application.

Some device vendors have installed ZawgyiOne in place of a Unicode font. In this case, Zawgyi will look right, but Unicode text in messages and web sites will look wrong.

It is also possible for an application or device to detect and convert text to match the installed fonts.

Q: How can I tell if my system is using a Unicode font or Zawgyi by default?

A: Just examine the appearance of the Myanmar character code point (U+104E) here:

If the above is looks like this character, you have a Unicode font: Myanmar character U+104E
If it looks like this, your browser is using Zawgyi or Myazedi: Zawgyi character for code point  x104e
If the above is blank or a box, no Myanmar font was found.

Q: My friends all use Zawgyi in email and texting, but my device only supports Unicode. How can I communicate with them?

A: This is complicated, primarily because fonts cannot be added or changed on most mobile devices. Free Myanmar Unicode keyboards are available for most mobile devices from online sources, so work with your friends to agree on a common way to communicate.

Apps that convert between Zawgyi and Unicode are also available. Copying and pasting text messages into such an interactive converter will let you read any message.

Q: Do I need an input method editor (IME) to properly enter Myanmar text in Unicode?

A: The keyboard arrangement does not determine if the text is Unicode or another form. Keyboard applications that produce Unicode are available on most devices and web apps. Some browsers support extensions that provide virtual keyboards for Myanmar and other scripts.

Q: Is the keyboard arrangement for Unicode different from other fonts?

A: Unicode does not specify a keyboard arrangement, but leaves the keyboard or IME provider free to arrange the keyboard in the most natural way for the users. However, a Unicode font requires many fewer keys, because only one code point is needed for each diacritic.

Q: Where do I find the Unicode characters for Myanmar script?

A: The Myanmar script is documented in Section 16.3, Myanmar in The Unicode Standard. This link presents all Myanmar script characters defined in the Unicode blocks. The detailed character properties for each code point are available, too.

Q: What about collation of Myanmar language data? Is that just a binary sort?

A: Generally, a binary sort is not recommended. Instead, use Unicode Collation. The collation chart for Myanmar is here.

Q: I cannot find the code points for the kinzi in Unicode. What do I do?

A: A kinzi is where the first consonant in a cluster is a non-word-final letter NGA. In this case, NGA rises over the following letter, as described here. This combination appears as a small character similar to a Greek epsilon over the following letter.

Here is an example of a kinzi in text:

Myanmar word with kinzi

The code points are: 101E 1004 103A 1039 1018 1031 102C.

In Unicode, NGA + ASAT is entered, followed by a VIRAMA to position the character over the next letter. This allows searching for the NGA letter as a character. The Unicode sequence for kinzi sequence is U+1004 NGA, U+103A ASAT, U+1039 VIRAMA, followed by the next letter.

Note that some Unicode keyboards may provide an input key for the kinzi. In this case, the output is the series of code points defined above.

Q: How do I put a virama in my text?

A: Unicode's virama code point is U+1039. This can be entered with an appropriate keyboard.

Q: I want to read content that includes both Unicode and Zawgyi text. Can I do this?

A: This is difficult because it requires either converting the Zawgyi text to Unicode, then displaying with a Unicode-compliant font or wrapping each part of the text in some code that will apply an appropriate font. If possible, contact the source of the text and encourage them to provide it in Unicode form.

Q: I am using a Unicode-compliant font on Unicode text. However, some characters are rendered incorrectly. What is wrong?

A: Some rendering engines may not properly render with all Unicode fonts. Make sure the font and rendering software are compatible. In some cases, changing to a different Unicode-compliant font may fix the problem. The Display Problems page has some general help that may be useful.

Another possibility is that the Unicode text is malformed, that is, the code points are incorrectly ordered.

Q: Will everyone in Myanmar eventually convert to Unicode?

A: The Unicode standard was designed to provide consistent and efficient interchange of all textual information. Currently, much Myanmar-language text online still uses font encoding. However, Unicode's benefits for the Myanmar language itself, as well as the enabling of non-Myanmar languages, are expected to make Unicode the only way to represent Myanmar script. This is the trend followed for all other scripts supported by Unicode.

Q: I have specific questions about other languages written with the Myanmar script, for example, Pali, Sanskrit, Shan, Mon, Karen, Kayah, and so on. Where can I learn more?

A: The Unicode Standard includes characters to support other languages written with this writing system. To create text, specific keyboards that have the characters for the language may be required, because a standard Burmese keyboard does not have all the characters for Shan, Mon, Karen, and so on. These may be available as web applications and as soft keyboards on mobile devices. Unicode-compliant fonts should have the full range of characters in all three Myanmar code blocks.

Q&A contributed by [SRL] & [CWC]