Re: U+0185 in Zhuang and Azeri (was Re: unicode Digest V4 #3)

From: Kenneth Whistler (
Date: Mon Jan 05 2004 - 20:37:30 EST

  • Next message: Philippe Verdy: "Re: Latin letter GHA or Latin letter IO ?"

    [Doing a little cut and pasting here to coalesce the context...]

    > Peter Kirk wrote,
    > >
    > > I note an incorrect glyph for U+0185 in Code2000 and in Arial Unicode
    > > MS; this looks like b with no serif at the bottom but should be much
    > > shorter, like ь, the Cyrillic soft sign.

    James Kass responded:

    > ... With regards to U+0185, could it be
    > said that the informative glyph in TUS 2.0, 3.0 and 4.0 is a bit
    > misleading, or does that glyph represent a variance from the
    > text(s) with which you're familiar?

    > This page uses a scan from THE LANGUAGES OF THE WORLD
    > as its Chuang example:
    > No sample text, no lower case illustration:
    > If the informative glyph in TUS *is* misleading, I'll be happy
    > to make appropriate changes here.

    Peter Kirk responded:

    > Yes, you are right, and using a very British hyperbole [recte: litotes].
    > The TUS 4.0
    > glyph is quite simply incorrect. That is, it is incorrect for the
    > Azerbaijani, Khakass and Nogai letter, and it does not make a proper
    > distinction from the otherwise almost identical "b". The glyph should
    > have the same height as most lower case letters. ... That is, shorter
    > than the reference glyph in TUS 4.0. This reference
    > glyph needs to be changed. I would suggest a form identical to U+0446.

    Before we go charging off to fix all the fonts, we first need
    to have clarity regarding which characters are intended for what

    Michael Everson has asserted that U+0184/U+0185 *are* the intended
    characters for the Pan-Turkic Latin alphabetic use of the Cyrillic
    soft sign letter. This is at odds with the history of the Unicode
    Standard and with Michael's own prior assertion in:

    "Latin <soft sign> [is] not encoded in the UCS, complicating
    things like monolingual multiscript ordering since the current
    UCS expects Cyrillic <soft sign> to do double duty." [2000-06-02]

    That earlier statement by Michael correctly reflects the intent
    of the standard, I believe. It also correctly reflects Michael's
    observation earlier today:

    > In Pan-Turkic, though, it looks just like CYRILLIC SOFT SIGN in all
    > the sources I have seen. For lots of languages.

    And the Unicode solution for that, to date, has been that since
    it "looks just like" the CYRILLIC SOFT SIGN in all the sources,
    by gum, it *is* the CYRILLIC SOFT SIGN.

    [Now don't pile on all at once regarding mixed scripts for
    alphabets and rehearsing for the umpteenth time the arguments
    about Kurdish Q/W. We've heard all that, and there are
    abiding philosophical differences in the committees regarding
    when letters borrowed from one script into another become
    nativized into that script and require separate encoding.
    That is all for another thread. What I am telling you all
    here is what the *intent* of the standard has been regarding
    this *particular* pair of letters, since 1991.]

    The upshot of that is that the glyphs for U+0184/U+0185 are
    not to be determined by Azeri/Khakass/Nogai typography, but
    by Zhuang typography, for which they were encoded. The
    glyphs for U+042C/U+044C are correct for representing the
    soft sign in the Pan-Turkic alphabet because, well, they
    *are* the soft sign.

    Now, let's review the intent for Zhuang orthography. (aka Chuang)
    Based on sources such as Katzner (cited in this thread on
    available on the web) and Nakanishi, the 5 Zhuang tone
    letters were encoded in Unicode as:

    Tone 2: U+01A7/U+01A8 (reversed s)
    Tone 3: U+0417/U+0437 (Cyrillic ze)
    Tone 4: U+0427/U+0447 (Cyrillic che)
    Tone 5: U+01BC/U+01BD (roughly 5-shaped letter)
    Tone 6: U+0184/U+0185 (similar to soft-sign, but not identical)

    Everyone recognizes that the tone letters were mnemonically
    based on 2, 3, 4, 5, 6, as well, but there was no point in
    actually *using* the digits, as the tone letters are actually
    shaped differently and their usage would interfere with the
    use of normal digits in Zhuang text.

    The Unicode shapes and tone letter identities for Zhuang are
    roughly consonant with those also shown at:

    except that the glyph for Tone 4 there is much less che-like in
    shape, but still not actually a "4". Running text citations,
    as in Katzner, clearly show Cyrillic ze and che in use for those
    tones. The debatable edge case was always for tone 6, where you
    could argue that the Zhuang citations were merely an "off" shape
    for a Cyrillic soft sign that happened to be used in the text.
    But as for tones 2 and 5, the more conservative approach taken
    at the time, in 1990, was to simply identify Zhuang tone 6 as
    a distinct form, not identical to the soft sign, and so it
    was separately encoded at U+0184/U+1085.

    Note that there are more modern representations of Zhuang that
    dispense with the special tone letters altogether and
    substitute out ordinary Latin letters, in a Pinyin-like
    simplification. See:

    with a sign showing the substitution of Latin J, H, Z, X, W(?)
    for the 5 Zhuang tone letters.

    This may reflect an official attempt to establish a new
    Latin orthography for Zhuang. See:

    "The language was not written down until the government
    made an attempt in the early 1950's, but they chose to use a
    Russian script [sic] and it was never accepted by the
    people. A new Latin script was devised in 1986 and the government
    through the Minorities Language Commission has encouraged Zhuang
    to learn this."

    For more background on the political context of Zhuang
    orthography development, see:

    In particular, the about-face by the central government
    regarding minority community policies in the late 50's
    impacted the history of the Zhuang orthography's use:

    "In the middle 1960s, the new Zhuang, Lisu, and Lahu
    written languages were withdrawn from the few schools
    where they had survived the promotion of Chinese in the
    late 1950s."

    I presume that the 1986 orthography is what is shown in the Liuzhou
    sign noted above.

    So in any case we may be talking about the encoding of the
    tone letters for a failed attempt at establishing a
    Latin/Cyrillic hybrid orthography that failed in the late 1950's
    and early 1960's in China. It is unclear to me whether the
    revival of the use of written Zhuang in the 1980's is based
    on the original Zhuang forms or a revision of them without
    the Cyrillic-based additions and tone letters.

    Perhaps someone on the list who knows more about the actual
    history of orthographic reform in the Zhuang Autonomous Region
    of Guangxi could chime in with more details.


    This archive was generated by hypermail 2.1.5 : Mon Jan 05 2004 - 21:14:48 EST