Myanmar Scripts and Languages
- How is the Myanmar script encoded?
- What other encodings are commonly used for the Myanmar language?
- What are the differences between the Unicode and
the ad hoc encodings, such as Zawgyi?
- What are some of the visible differences between Unicode and the ad hoc encodings such as Zawgyi?
- Can we use a universal font that will display Unicode and Zawgyi text together?
- Doesn't "UTF-8" indicate Unicode?
- How can I tell what encoding is used for a particular website or piece of text?
- Isn't Unicode just another font for Myanmar?
- Are there recommended Unicode fonts for Myanmar text? Where can I find them?
- What languages can be written with Unicode Myanmar characters?
- Are there any tools that can help me detect Zawgyi encoded text and convert it to Unicode?
- Is it possible to convert text in other encodings to Unicode?
- Should I support both Unicode and Zawgyi on my site? If so, how do I do that?
- My site has content entered by users in both Zawgyi and Unicode text. How can my users read both?
- How is Myanmar handled on mobile devices?
- How can I tell if my system is using a Unicode font or Zawgyi by default?
- My friends all use Zawgyi in email and texting, but my device only supports Unicode. How can I communicate with them?
- Do I need an input method editor (IME) to properly enter Myanmar text?
- Is the keyboard arrangement for Unicode different from other fonts?
- Where do I find the Unicode characters for Myanmar script?
- What about collation of Myanmar language data? Is that just a binary sort?
- I cannot find the code points for the kinzi in Unicode. What do I do?
- How do I put a virama in my text?
- I want to read content that includes both Unicode and Zawgyi text. Can I do this?
- I am using a Unicode-compliant font on Unicode text. However, some characters are rendered incorrectly. What is wrong?
- Will everyone in Myanmar eventually convert to Unicode?
- I have specific questions about other languages written with the Myanmar script, for example, Pali, Sanskrit, Shan, Mon, Karen, Kayah, and so on. Where can I learn more?
Q: How is the Myanmar script encoded?
A: The Myanmar script was added to the Unicode Standard in Version 3.0 (September, 1999).
Version 5.2 significantly extended the script in 2009. Unicode now has three
blocks for characters of this script:
These code points support several languages written in the script,
including Myanmar (Burmese), Pali, Sanskrit, Shan, Mon, Karen, Kayah,
and others.
Code points include letters (consonants and independent vowels), vowel
signs, medial signs, digits, various signs, and punctuation. Medial
and vowel signs, anusvara, visarga, virama, asat and others combine
with letters.
Q: What other encodings are commonly used for the Myanmar language?
A: There are several ad hoc font encodings in common use,
all needing specific fonts to render text. ZawgyiOne,
Zawgyi 2008, and Myazedi are most commonly used.
Q: What are the differences between Unicode and the ad hoc encodings, such as Zawgyi?
A: Unicode's Myanmar script provides:
- Compatibility across platforms, operating systems, and programming languages
-
Unique code points for each consonant, vowel, and modifier, regardless of visual appearance
-
Efficient use of code space
-
The ability to support all languages that can be written with the script
-
A unique ordering of code points comprising a
Myanmar syllable (consonants, vowels, and so on), where vowels always follow the
consonant
- Consistent implementation of text comparison, search, and other language processing
-
Font-independent representation, allowing rendering with any
Unicode-compliant font installed on a device
The ad hoc font encodings such as Zawgyi have many serious problems:
- No compatibility across platforms, operating systems, or programming languages
- Incompatible with Unicode, the widely supported
international standard
-
Use of multiple code points for characters and combined renderings, leading to interchange chaos
-
Inefficient use of the code range, requiring twice as many code points
to represent only a subset of the script
-
No support for all the languages used in Myanmar, making it impossible to
show text in languages using this script other than Myanmar
- Vowel code points may appear before or after a consonant. This results in different representations for each visual rendering, leading to search and comparison problems.
- Inconsistent text comparison, searching, and other language processing, often within a single document.
-
Lack of font support. Because the appearance of a syllable depends on
the specific code points selected, text in these ad hoc
encodings such as Zawgyi can only be rendered if the specific font is installed on
the target device.
- No support in standard software offerings
Q: What are some of the visible differences between Unicode and the ad hoc encodings such as Zawgyi?
A:
Wikipedia:Font
shows code point differences between Unicode and Zawgyi, the most
commonly used ad hoc scheme.
For each combining character, the Unicode Standard defines a single code point that
is rendered appropriately for the base character. For example U+103C,
the ra medial surrounds an associated consonant with a
line. A Unicode font generates the right shape at display time.
Non-Unicode fonts define as many as 8 code points for different parts of
the same ra glyph. Typing is cumbersome because the user must select the
right form for each context.
An incorrect match between font and text shows "dotted" characters or overlapping lines, and also incorrect characters, as shown in the following table.
Encoding |
With Unicode (Padauk) font |
With ZawgyiOne font |
Code points |
Unicode text |
|
|
U+1015 U+103C U+102F U+101C U+102F U+1015 U+103A U+1019 U+103E U+102C |
Zawgyi-encoded text |
 |
 |
0x1018 0x101a 0x1039 |
Unicode text can be displayed using any Unicode-compliant
font. However, non-Unicode text can only be displayed with its encoded
font.
Unicode also defines a unique order of code points for base letters
and combining characters.
Q: Isn't there a universal font
that will display Unicode and Zawgyi text together?
A: No. Since the code points for Zawgyi and Unicode use the same
range (0x1000-0x109f), no font can automatically apply the right
character shapes. A universal font is impossible.
Zawgyi should be converted to Unicode before adding to a web page or
other display.
If absolutely necessary, HTML can explicitly specify a non-Unicode font for
a tagged region if the encoding of the text is non-Unicode.
Q: Doesn't "UTF-8" indicate Unicode?
A: Yes, if properly used with Unicode code points. "UTF-8" technically does not apply to ad hoc font encodings such as Zawgyi.
Q: How can I tell what encoding is used for a particular website or piece of text?
A: Almost all text in a given encoding will render correctly only when displayed with a compatible font. For example, Zawgyi text will appear incorrectly with a Unicode font, and text encoded as Unicode will look wrong with the ZawgyiOne font. However, some strings look identical in both encodings because all these fonts have a common subset of characters.
Some online tools are available that will help determine the encoding of text.
- zawgyi-unicode-test.appspot.com takes text and displays it using several common fonts, including two different Unicode fonts
- Several web sites offer detectors that use Javascript or other methods to look for non-Unicode patterns:
The Unicode Consortium does not guarantee that these tools are accurate or complete, however.
Q: Isn't Unicode just another font for Myanmar?
A: No, Unicode is neither a font nor a font encoding. It defines
character code points and also requires a specific ordering of code
points for consistent text rendering. Unicode-compliant fonts
already expect this ordering to display characters correctly.
As a published standard, Unicode describes each code point, including
characteristics used for other text processing functions such as
collation, combining status, and combining order. These data are
available via the ICU C++ and
Java libraries.
Q: Are there recommended Unicode fonts for Myanmar text? Where can I find them?
A: Many Unicode-compatible fonts are available for Mynamar text. These
can be found with an online search. Important: proper display of
characters depends both on the font and the rendering software
(engine) used. Some rendering engines do not fully support all Unicode sequences.
Q: What languages can be written with Unicode Myanmar characters?
A: Characters for these languages are supported in the three Unicode
Myanmar blocks noted above:
- Myanmar (Burmese)
- Mon
- S'gaw Karen
- Western Pwo Karen
- Eastern Pwo Karen
- Geba Karen
- Kaya
- Shan
- Rumai Palaung
- Khamti Shan
- Aiton
- Phake
- Pa'o Karen
- Shwe Palaung
- Shan Pali
- Tai Laing
Q: Are there any tools that can help me detect Zawgyi encoded text and convert it to Unicode?
A: Yes. Because some strings are valid in both Zawgyi and Unicode, it is not always possible to achieve 100% accuracy in distinguishing the two.
The library Myanmar Tools uses a machine learning model to estimate whether a string is represented in Zawgyi or in Unicode. The tools are available from the Google i18n team in JavaScript, Java (Android), and other programming environments.
See: https://github.com/googlei18n/myanmar-tools
Note: Detectors that use hand-coded rules are susceptible to flagging content in other languages like Shan and Mon as Zawgyi when it is actually Unicode, so are not generally recommended.
Q: Is it possible to convert text in other encodings to Unicode?
A: Yes, several converters are publicly available:
The Unicode Consortium does not guarantee the quality of these solutions, however.
Q: Should I support both Unicode and Zawgyi on my site? If so, how do I do that?
A: Because many platforms do not yet have Unicode fonts, it is helpful to provide a way for all users to view content. The preferred technique is to detect the encoding of user-entered text,
then convert to Unicode. Display the converted text.
Other options are to use a webfont on your site and apply it with CSS
in any HTML block that displays text, for example, a <div
class="myfontclass"> tag. The font is loaded along with the
text, allowing modern browsers to display text in the loaded
font. This works well in most cases for either Zawgyi or Unicode
text. However, transmitting the font increases load time for such
content.
Another option is to let users switch via a prominent control on the page
to select either Unicode or other encoding. Then use this setting to
load pages in the selected encoding. An automatic converter may be used to
prepare text as needed. This has the advantage of avoiding font download,
but adds complexity to both the client and server.
As a final option, don't worry about it. Provide content in only Zawgyi
or only in Unicode and let users determine whether to use your site based on the
encoding. This limits the usability of your content, of course, because
either the Zawgyi or the Unicode content will appear
garbled depending on the user's installed font.
Remember that search engines may not understand all text
encodings. Unicode text on your site can be consistently interpreted.
Q: My site has content entered by users in
both Zawgyi and Unicode text. How can my users read both?
A: It's great that you want your users to be able to read all
messages! There are at least two ways to enable this, similar to the
methods described above for websites. Each requires detecting the
encoding of each message posted.
The preferred method is to convert all postings to Unicode form. Set CSS to use
Unicode-compatible fonts.
An alternative method is to use web fonts for the site. Make sure each posting is
in its own tagged block such as div. Set the CSS for each post to either a Unicode font or Zawgyi, depending on what was detected for the individual posting. Note that this will result in an inconsistent look to the text due to different font styles.
You may also consider educating your users on using Unicode fonts.
Q: How is Myanmar handled on mobile devices?
A: Most mobile devices do not allow the user to change or replace the installed fonts.
An application may "bundle" a font, but that will only be used within the application, not for other tools or apps.
Many devices already include a Unicode-compliant font that is used by
default for any Myanmar text. Any Unicode text will appear
correctly in the system-installed applications. Zawgyi text will look
wrong unless the particular application has included the Zawgyi font
within the application.
Some device vendors have installed ZawgyiOne in place of a Unicode
font. In this case, Zawgyi will look right, but Unicode text in
messages and web sites will look wrong.
It is also possible for an application or device to detect and convert
text to match the installed fonts.
Q: How can I tell if my system is using a
Unicode font or Zawgyi by default?
A:
Just examine the appearance of the Myanmar character code point (U+104E) here:
၎
If the above is looks like this character, you have a Unicode font: |
 |
If it looks like this, your browser is using Zawgyi or Myazedi: |
 |
If the above is blank or a box, no Myanmar font was found. |
|
Q: My friends all use Zawgyi in email and texting, but my device only supports Unicode. How can I communicate with them?
A: This is complicated, primarily because fonts cannot be added or
changed on most mobile devices. Free Myanmar Unicode keyboards are
available for most mobile devices from online sources, so work with
your friends to agree on a common way to communicate.
Apps that convert between Zawgyi and Unicode are also available. Copying and pasting text messages into such an interactive converter will let you read any message.
Q: Do I need an input method editor (IME) to properly enter Myanmar text in Unicode?
A:
The keyboard arrangement does not determine if the text is Unicode or
another form. Keyboard applications that produce Unicode are available
on most devices and web apps. Some browsers support extensions that
provide virtual keyboards for Myanmar and other scripts.
Q: Is the keyboard arrangement for Unicode different from other fonts?
A: Unicode does not specify a keyboard arrangement, but
leaves the keyboard or IME provider free to arrange the keyboard in
the most natural way for the users. However, a Unicode font requires many fewer keys, because only one code point is needed for
each diacritic.
Q: Where do I find the Unicode characters
for Myanmar script?
A: The Myanmar script is documented in Section 16.3, Myanmar in The Unicode Standard. This link presents all Myanmar script characters defined in the
Unicode blocks. The detailed character properties for each code
point are available, too.
Q: What about collation of Myanmar language data? Is that
just a binary sort?
A: Generally, a binary sort is not recommended. Instead,
use Unicode
Collation. The collation chart for Myanmar
is here.
Q: I cannot find the code points for the kinzi in Unicode. What do I do?
A:
A kinzi is where the first consonant in a cluster is a non-word-final letter NGA. In this case, NGA rises over the following letter, as described here. This combination appears as a small character similar to a Greek epsilon over the following letter.
Here is an example of a kinzi in text:
The code points are: 101E 1004 103A 1039 1018
1031 102C.
In Unicode, NGA + ASAT is entered, followed by a VIRAMA to position
the character over the next letter. This allows searching for the NGA
letter as a character. The Unicode sequence for kinzi
sequence is U+1004 NGA, U+103A ASAT, U+1039 VIRAMA, followed by the next letter.
Note that some Unicode keyboards may provide an input key for the
kinzi. In this case, the output is the series of code points defined
above.
Q: How do I put a virama in my
text?
A:
Unicode's virama code point is U+1039. This can be entered with an appropriate keyboard.
Q: I want to read content that includes
both Unicode and Zawgyi text. Can I do this?
A:
This is difficult because it requires either converting the Zawgyi text to Unicode, then displaying with a Unicode-compliant font or wrapping each part of the text in some code that will apply an appropriate font. If possible, contact the source of the text and encourage them to provide it in Unicode form.
Q: I am using a Unicode-compliant font on
Unicode text. However, some characters are rendered incorrectly. What
is wrong?
A:
Some rendering engines may not properly render with all Unicode fonts. Make sure the font and rendering software are compatible. In some cases, changing to a different Unicode-compliant font may fix the problem.
The Display Problems page has some general help that may be useful.
Another possibility is that the Unicode text is malformed, that is, the
code points are incorrectly ordered.
Q: Will everyone in Myanmar eventually convert to Unicode?
A: The Unicode standard was designed to provide consistent and efficient
interchange of all textual information. Currently, much Myanmar-language text online still uses font
encoding. However, Unicode's benefits for the
Myanmar language itself, as well as the enabling of non-Myanmar
languages, are expected to make Unicode the only way to represent Myanmar
script. This is the trend followed for all other scripts supported by
Unicode.
Q: I have specific questions about other
languages written with the Myanmar script, for example, Pali, Sanskrit, Shan,
Mon, Karen, Kayah, and so on. Where can I learn more?
A:
The Unicode Standard includes characters to support other languages written with this writing system. To create text, specific keyboards that have the characters for the language may be required, because a standard Burmese keyboard does not have all the characters for Shan, Mon, Karen, and so on. These may be available as web applications and as soft keyboards on mobile devices. Unicode-compliant fonts should have the full range of characters in all three Myanmar code blocks.
Q&A contributed by
[SRL] &
[CWC]