Re: Taiwanese: unicode of o with dot right above

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Aug 14 2000 - 22:48:06 EDT


Doug Ewell wrote:

>
> (Summary for the impatient:
> A new character, COMBINING DOT ABOVE RIGHT, should be proposed.)

New summary for the impatient:

A new character, COMBINING DOT ABOVE RIGHT, should *NOT* be encoded.
It was already proposed two years ago, and has been considered and
turned down in the context of the issues raised by Kiatgak.

For the obsessive-compulsives, the relevant WG2 documents were:

WG2 N1493, 1997-06-26, Proposal to add Latin characters required
           by Latinized Taiwanese languages to ISO/IEC 10646,
           coauthored by Te Khai-su and Michael Everson. (= L2/97-148)

WG2 N1712R, 1998-03-18, Comment on N 1593 - Proposal to add Latin
           characters by Latinized Taiwanese languages, authored
           by TCA (Taiwan Computer Association). (= L2/98-089)

Here is my analysis of the requirement, and of the encoding solution:

Minnanhua (aka Minnan, aka Taiwanese, aka Holo, aka Ho-lo-oe) has
been written in various Latin-based orthographies. Minnanhua has
a phonological distinction between two mid back rounded vowels,
/o/ and /open-o/, in open syllables. Some of the missionary-designed
orthographies represented that distinction by using an <o> and an
<o-with-dot-above>. This is a direct extension of the typical
usage seen in American English dictionaries of the Webster tradition,
where an <o-with-dot-above> is listed in the pronunciation guides as
the symbol for "aw as in law", etc., i.e. IPA U+0254 (or U+0252,
depending on your dialect).

The problem is that Minnanhua is, of course, a tonal language with
many tonemes, and the missionary orthographies wrote the tones with
accents above the vowels (acute, grave, macron, circumflex, and vertical
tick U+030D). This presented the same orthographic issue as double
accents on the better-known example of a multi-vowel multi-tone
Latin orthography -- namely Vietnamese. Since stacking two accents
directly on top of each other presented typographical problems, the
Bible publishers for Minnanhua apparently innovated in comparable
ways to the side-by-side placement that was developed in Vietnamese
orthography to deal with tonal accent placement on vowels that already
had diacritic circumflexes or breves on top. However, in the available examples
of the Latin Minnanhua orthography, the typographers seem to have
innovated by shoving the dot above for the o's far over the right
shoulder of the o's, so they could use standard accented o's from
their fonts (o-macron, o-acute, o-grave, and so on) and just typeset
the "dot-above" with a narrow (spacing) raised dot. Typewriters could
handle this, too, by just typing a raised dot after the o.

The modern encoding solution for this is comparable to what we
do for Vietnamese. The basic open-o vowels for Minnanhua are
already encoded:

U+022E LATIN CAPITAL LETTER O WITH DOT ABOVE
U+022F LATIN SMALL LETTER O WITH DOT ABOVE

It happens that these precomposed letters got into 10646 (and hence
Unicode) first for Livonian, but they serve just as well for the
Webster's dictionary pronunciation symbol or the Minnanhua vowels
when written in the missionary orthographies. The tones are then
represented by the appropriate combining characters. (U+0300,
U+0301, U+0302, U+0304, U+030D)

A default Unicode rendering system won't do a very good job of rendering
the missionary orthographies, since it will just stack the tones on
top of the dot-above. But a system using specialized Minnanhua fonts,
such as the HOTSYS(r) fonts created by Te Khai-su that started off all
this discussion back in 1997 have ligated glyphs for all the open o's,
with the dot shoved over to the shoulder of the o's.

This is exactly the same as the Vietnamese solution. A default Unicode
rendering system won't do a very good job of rendering decomposed
Vietnamese, but an appropriate designed font will have the mappings
for the base + accent combinations that result in culturally appropriate
rendering of the multiply-accented forms.

So introduction of a "COMBINING DOT ABOVE RIGHT" on the basis
of the Minnanhua orthographic data -- and particularly the HOTSYS(r)
fonts used by the HoloWin and HakkaWin word processors -- would *not*
be the right way to go. It would be encoding a language-specific
double-accent glyph placement variant of an already existing
encoded character.

>
> Kiatgak <kiatgak@pchome.com.tw> wrote:
>
> > 1. U+0186/U+0254 (LATIN CAPITAL/SMALL LETTER OPEN O)
> > with alternative form in font design.
> >
> > This solution is based on the pronunciation and need the help of
> > font design, but it induces different outlooks of OPEN O. Is that
> > allowed or adequate?
>
> Probably not, regardless of the related meaning. The "alternative form"
> is too different from the standard glyph, and IPA glyphs, perhaps more
> than any others except dingbats, need to be constant.

This is clearly not the way to go. The *character* here is LATIN
CAPITAL/SMALL LETTER O WITH DOT ABOVE. It is used to indicate
the vowel that in an IPA-based transcription would
be indicated with LATIN SMALL LETTER OPEN O (note, *not* a capital).

>
> > 2. U+004F/U+006F(O/o) + U+00B7(MIDDLE DOT)
> > with the GSUB to fix the outlooks in font design.
> > The problem is U+00B7 is not a combining character.
>
> That's another problem, yes. This is supposed to be one character, so
> it should not be encoded with two spacing characters.

Surprisingly, option 2 is *also* valid, in my opinion. It would be
an appropriate way to represent the typewriter orthography that
would use a spacing raised dot. What we have then is one *letter*,
represented digraphically. Use of the raised dot as a modifier letter
in this way is rather common, though more often to indicate vowel
length, rather than a difference in vowel quality. But this is a
special case where the dot above has "fallen off" the vowel due to
typographical practice, as described above.

> > Another problem of the same reason:
> > Is it a valid sequence if a combining character follows them,
> > eg. U+004F/U+006F(O/o) + U+00B7(MIDDLE DOT) + U+0301(COMBINING
> > ACUTE ACCENT)
> > Is such a solution allowed or adequate?
>
> You could do it, but as Peter Constable pointed out, the acute accent
> would be centered over the dot rather than the 'o', which is probably
> not what you want.

No, but the solution of course, when using a raised middle dot is:

    U+006F U+0301 U+00B7

You apply the tone to the base and then put the middle dot. But this
would only apply to the typewriter orthography anyway, since the
"real" orthography would use the o's with dot above.

>
> > 3. U+004F/U+006F(O/o) + U+05C1(HEBREW POINT SHIN DOT).
> >
> > U+05C1 is the only combining character with a dot in the north-
> > east corner which I can find in unicode 3.0.
> > To use it is only based on the outlook.
>
> > 4. U+004F/U+006F(O/o) + U+031B (COMBINING HORN) or precomposed ones
> > U+01A0/U+01A1(LATIN CAPITAL/SMALL LETTER O WITH HORN).
> >
> > This solution is based on the similar outlooks.
> > It has the same problem as 3.: use character by outlooks not by
> > meaning. Even worse: it changes the shape of a horn into a dot.

3 and 4 are bad for many reasons, as others have pointed out. It isn't
a good idea to solve an encoding problem by scrounging around for
something totally unrelated to the problem at hand and using it because
it is reminiscent in shape to what you want to encode.

> > 5. To apply a new combining character.
>
> This is the way to go. A new character, COMBINING DOT ABOVE RIGHT
> (analogous to U+0307 COMBINING DOT ABOVE), should be proposed. This
> doesn't seem to be an especially productive diacritic -- so far it only
> appears with 'O' and 'o' -- but a combining character is much more
> likely to be approved than a precomposed character.

Nope. Not in this case -- since the issue has already been discussed
and this proposed solution has been turned down by WG2 and UTC.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:07 EDT