Multiple-Octet Codes



November 27, 2000


Title:                Response to Japanese Query re IPA tone letters

Source:           USA (ANSI)

Status:            U.S. position

Action:            For the consideration of WG2

References:      WG2 N2195

Distribution:       ISO/IEC JTC1/SC2/WG2 members



Document WG2 N2195, “Rationale for non-Kanji characters proposed by JCS committee”, provided the rationale for various non-Kanji characters from JIS X 0213 that had been proposed for encoding in 10646. Most of the issues raised by the U.S. committee were satisfactorily addressed by that document, but there was one open issue remaining regarding a difference of opinion about the use of the tone letters from 10646 (U+02E5..U+02E9) to cover the mapping requirements for the RISING SYMBOL and FALLING SYMBOL from JIS X 0213.

Section 3.3 of document WG2 N2195 cites a comment from a Japanese linguist, taking issue with the analysis of tone letters provided in the Unicode 2.0 book, claiming among other things that, “With this, one cannot make a distinction between high-rising and low-rising, and [so] it becomes impossible to do tonal transcription of languages that maintain such a distinction (e.g. Cantonese).”

This issue was left open, with an action item for the U.S. committee to provide further explanation to satisfy the JCS committee. This document is intended to close that action item by providing the clarification required.


The comment provided in section 3.3 of WG2 N2195 misunderstands the intent of the IPA tone letters, and of the suggestions for their use provided in the Unicode Standard. The transcription of tones is not a matter of superimposing a contour on a pitch, but rather of transcribing the contour of pitch. The tone letters provide a mechanism for doing so, by indicating the 5 relative levels of pitch that phonologists have determined to be sufficient for analyzing tonemic systems. Level tones can be indicated by a single tone letter. And any tonal pitch contour can be represented by an appropriate sequence of tone letters.

Thus, to represent a Cantonese “high-rising” tone, the sequence 45 is generally used. This corresponds to a UCS sequence of 02E6 + 02E5. To represent a Cantonese “low-rising” tone, the sequence 12 (or 13, or 24, depending on your analysis of the particulars of pitch) can be used. This corresponds to a UCS sequence of  02E9 + 02E8. The Handbook of the International Phonetic Association, 1999 (ISBN 0 521 63751 1), clearly shows examples of tone letters for use in transcribing tone for many languages, including Cantonese. See page 24 of that book for details. The contour tone letters are quite obviously not limited to the five examples shown in the IPA reference chart.

The two JIS X 0213 characters in question, RISING SYMBOL and FALLING SYMBOL, match two of the glyphs shown in the official IPA reference chart as representative of tone contours. But these are only 2 of many dozens of possibilities for tone contours making use of tone letters. In particular, RISING SYMBOL represents the 15 tone contour (i.e. <02E9, 02E5>), and FALLING SYMBOL represents the 51 tone contour (i.e. <02E5, 02E9>).

A very illustrative example of a generative implementation of tonal contours with tone letters can be found online at:


In particular, see section 6.8 “Tone Letters”, which shows how TIPA (TEX for IPA) makes use of a macro for constructing tone letters for any pitch contour.

\tone{214}ma “horse” \tone{51}ma “scold” as input would be converted by TIPA to the appropriate, constructed tone letter glyphs in front of ma to indicate the distinctions between these Mandarin Chinese tonal minimal pairs. In fact, the “\tone{51}” macro sequence in TIPA produces exactly what is intended by the JIS X 0213 FALLING SYMBOL.

These TIPA macros can be directly mapped to sequences of IPA tone letters from 10646. Thus:

\tone{214} è <02E8, 02E9, 02E6>.

\tone{51} è <02E5, 02E9>

Because this is the way the tone letters are designed, to be used in sequences to indicate contour pitches, it would be a mistake to separately encode two (or more) characters to indicate particular pitch contours. Among other things, this would introduce a normalization problem into 10646 for these particular characters. If FALLING SYMBOL were separately encoded, say as U+02EF, that would immediately create an equivalency problem: 02EF ?=? <02E5, 02E9>. Introduction of such equivalency problems into the standard is harmful to implementations of normalized forms of the standard.


The two characters from JIS X 0213, RISING SYMBOL and FALLING SYMBOL, should simply be mapped to 10646 by the use of the appropriate UCS Sequence Identifiers, as illustrated above. This provides a unique, round-trippable mapping between JIS X 0213 and 10646, and does not introduce normalization problems into the standard.

The set of five tone letter characters currently in 10646 are sufficient for tonal transcription for all languages.