Unicode & Taiwan/Tai-oan Hak-fa & Ho-lo-oe

From: Kai-hsu Tai (kaihsu@ugcs.caltech.edu)
Date: Sat Sep 14 1996 - 22:19:51 EDT


Tai5-oan5 Piau1-chun2 Po3-ko3:
Tai5-oan5 Hak6-fa4 kap4 Ho7-lo2-oe7 sou2
iong7 e5 ji7-goan5 kap4 in1 e5 Unicode ho7-be2
**********************************************

Tai-oan Piau-chun Report:
Characters used in Taiwanese Hak-fa
and Ho-lo-oe and their Unicode encodings
****************************************

TE3, Khai2-su7
kaihsu@ugcs.caltech.edu
1996-09-14

Soat4-beng5
===========
Thau5-cheng5 sia2 chit8-e5 "k" e5 si7 tai7-piau2 chit4-e5 ji7-goan5
ai3 the5-chhut4 kng3--jip8-khi3 Unicode lai7-te2 e5 iau1-kiu5; sia2
chit8-e5 "ch" e5 si7 chit4-ma2 to7 e7-sai2 iong7 Unicode e5 cho1-hap8 e5
hong1-hoat4 (combining) sia2--chhut4-lai5 e5 ji7-goan5 (ma7 ai3
the5-chhut4, kng3--jip8-ki3 Unicode).

Na7-si7 u7 sim2-mih8 m7-tioh8 iah8-si7 lau3-kau1 e5 sou2-chai7, chhiaN2
ka7 goa2 kong2.

Introduction
============
Characters marked with "k" are those which should be proposed to be
encoded in Unicode; those marked with "ch" are currently able to be
encoded as combining characters in Unicode (but should still be
proposed).

Please tell me if I missed anything or got anything wrong.

Si7 an2-choaN2 beh4 the5-chhut4 cho1-hap8-ho2 e5 ji7-goan5--neh4?
=================================================================
In1-ui7 Ho7-lo2-oe7 kap Hak6-fa4 teh4 iong7 siaN1-tiau7 ki3-ho7, m7-si7
kap4 Au1-chiu1 gu2-gian5 kang5-khoan2, sam1-put4-go7-si5 chiah4 iong7,
hoan2-tng3-si7 chha1-put4-to1 10-e5 ji7-goan5 to7 u7 chit8-e5
siaN1-tiau7 ki3-ho7. ChhiuN7 chit4-toaN7 bun5-ji7, tu5-liau2 te7
1 siaN1 kap4 te7 4 siaN1 i2-goa7, long2 ai3 iong7 siaN1-tiau7 ki3-ho7.
Na7-si7 bo5 hou7 cho1-hap8-ho2 e5 ji7-goan5 ho7-be2, kng3 chu1-liau7 e5
sou2-chai7 to7 ai3 cheng1-ka7 khong1-kan1.

The reason for proposing precomposed characters to be encoded:
==============================================================
Ho-lo-oe and Hak-fa is different from European languages in which
diacritics are only used occasionally. Ho-lo-oe and Hak-fa use
diacritics to indicate the tones of every syllable. For example, except
for tones 1 and 4, all the other numerals in the the previous passage
require diacritics. The space for data storage will increase a
considerable amount if some precomposed characters are not encoded.

------------------------------------------------------------------
Ho7-be2 Mia5
Code Name
------ -----------------------------------------------------------
Combining Diacritical Marks
===========================
U+0301 COMBINING ACUTE ACCENT
U+0300 COMBINING GRAVE ACCENT
U+0302 COMBINING CIRCUMFLEX ACCENT
U+0304 COMBINING MACRON ACCENT
U+030D COMBINING VERTICAL LINE ABOVE
U+0324 COMBINING DIAERESIS BELOW
 k COMBINING RIGHT DOT ABOVE

Precomposed Characters
======================
U+0000 -> U+007F Basic Latin [some of these are also listed below]

U+0061 LATIN SMALL LETTER A
U+00E1 LATIN SMALL LETTER A WITH ACUTE
U+00E0 LATIN SMALL LETTER A WITH GRAVE
U+00E2 LATIN SMALL LETTER A WITH CIRCUMFLEX
U+0101 LATIN SMALL LETTER A WITH MACRON
 ch LATIN SMALL LETTER A WITH VERTICAL BAR

U+0065 LATIN SMALL LETTER E
U+00E9 LATIN SMALL LETTER E WITH ACUTE
U+00E8 LATIN SMALL LETTER E WITH GRAVE
U+00EA LATIN SMALL LETTER E WITH CIRCUMFLEX
U+0113 LATIN SMALL LETTER E WITH MACRON
 ch LATIN SMALL LETTER E WITH VERTICAL BAR

U+0069 LATIN SMALL LETTER I
U+00ED LATIN SMALL LETTER I WITH ACUTE
U+00EC LATIN SMALL LETTER I WITH GRAVE
U+00EE LATIN SMALL LETTER I WITH CIRCUMFLEX
U+012B LATIN SMALL LETTER I WITH MACRON
 ch LATIN SMALL LETTER I WITH VERTICAL BAR

U+006F LATIN SMALL LETTER O
U+00F3 LATIN SMALL LETTER O WITH ACUTE
U+00F2 LATIN SMALL LETTER O WITH GRAVE
U+00F4 LATIN SMALL LETTER O WITH CIRCUMFLEX
U+014D LATIN SMALL LETTER O WITH MACRON
 ch LATIN SMALL LETTER O WITH VERTICAL BAR

 k LATIN SMALL LETTER O WITH RIGHT DOT ABOVE
 k LATIN SMALL LETTER O WITH RIGHT DOT ABOVE WITH ACUTE
 k LATIN SMALL LETTER O WITH RIGHT DOT ABOVE WITH GRAVE
 k LATIN SMALL LETTER O WITH RIGHT DOT ABOVE WITH CIRCUMFLEX
 k LATIN SMALL LETTER O WITH RIGHT DOT ABOVE WITH MACRON
 k LATIN SMALL LETTER O WITH RIGHT DOT ABOVE WITH VERTICAL BAR

U+0075 LATIN SMALL LETTER U
U+00FA LATIN SMALL LETTER U WITH ACUTE
U+00F9 LATIN SMALL LETTER U WITH GRAVE
U+00FB LATIN SMALL LETTER U WITH CIRCUMFLEX
U+016B LATIN SMALL LETTER U WITH MACRON
 ch LATIN SMALL LETTER U WITH VERTICAL BAR

U+006D LATIN SMALL LETTER M
U+1E3F LATIN SMALL LETTER M WITH ACUTE
 ch LATIN SMALL LETTER M WITH GRAVE
 ch LATIN SMALL LETTER M WITH CIRCUMFLEX
 ch LATIN SMALL LETTER M WITH MACRON
 ch LATIN SMALL LETTER M WITH VERTICAL BAR

U+006E LATIN SMALL LETTER N
U+0144 LATIN SMALL LETTER N WITH ACUTE
 ch LATIN SMALL LETTER N WITH GRAVE
 ch LATIN SMALL LETTER N WITH CIRCUMFLEX
 ch LATIN SMALL LETTER N WITH MACRON
 ch LATIN SMALL LETTER N WITH VERTICAL BAR

U+1E73 LATIN SMALL LETTER U WITH DIAERESIS BELOW
 ch LATIN SMALL LETTER U WITH DIAERESIS BELOW WITH ACUTE
 ch LATIN SMALL LETTER U WITH DIAERESIS BELOW WITH GRAVE
 ch LATIN SMALL LETTER U WITH DIAERESIS BELOW WITH CIRCUMFLEX
 ch LATIN SMALL LETTER U WITH DIAERESIS BELOW WITH VERTICAL BAR

U+0041 LATIN CAPITAL LETTER A
U+00C1 LATIN CAPITAL LETTER A WITH ACUTE
U+00C0 LATIN CAPITAL LETTER A WITH GRAVE
U+00C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
U+0100 LATIN CAPITAL LETTER A WITH MACRON
 ch LATIN CAPITAL LETTER A WITH VERTICAL BAR

U+0045 LATIN CAPITAL LETTER E
U+00C9 LATIN CAPITAL LETTER E WITH ACUTE
U+00C8 LATIN CAPITAL LETTER E WITH GRAVE
U+00CA LATIN CAPITAL LETTER E WITH CIRCUMFLEX
U+0112 LATIN CAPITAL LETTER E WITH MACRON
 ch LATIN CAPITAL LETTER E WITH VERTICAL BAR

U+0049 LATIN CAPITAL LETTER I
U+00CD LATIN CAPITAL LETTER I WITH ACUTE
U+00CC LATIN CAPITAL LETTER I WITH GRAVE
U+00CE LATIN CAPITAL LETTER I WITH CIRCUMFLEX
U+012A LATIN CAPITAL LETTER I WITH MACRON
 ch LATIN CAPITAL LETTER I WITH VERTICAL BAR

U+004F LATIN CAPITAL LETTER O
U+00D3 LATIN CAPITAL LETTER O WITH ACUTE
U+00D2 LATIN CAPITAL LETTER O WITH GRAVE
U+00D4 LATIN CAPITAL LETTER O WITH CIRCUMFLEX
U+014C LATIN CAPITAL LETTER O WITH MACRON
 ch LATIN CAPITAL LETTER O WITH VERTICAL BAR

 k LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE
 k LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE WITH ACUTE
 k LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE WITH GRAVE
 k LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE WITH CIRCUMFLEX
 k LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE WITH MACRON
 k LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE WITH VERTICAL BAR

U+0055 LATIN CAPITAL LETTER U
U+00DA LATIN CAPITAL LETTER U WITH ACUTE
U+00D9 LATIN CAPITAL LETTER U WITH GRAVE
U+00DB LATIN CAPITAL LETTER U WITH CIRCUMFLEX
U+016A LATIN CAPITAL LETTER U WITH MACRON
 ch LATIN CAPITAL LETTER U WITH VERTICAL BAR

U+004D LATIN CAPITAL LETTER M
U+1E3E LATIN CAPITAL LETTER M WITH ACUTE
 ch LATIN CAPITAL LETTER M WITH GRAVE
 ch LATIN CAPITAL LETTER M WITH CIRCUMFLEX
 ch LATIN CAPITAL LETTER M WITH MACRON
 ch LATIN CAPITAL LETTER M WITH VERTICAL BAR

U+004E LATIN CAPITAL LETTER N
U+0143 LATIN CAPITAL LETTER N WITH ACUTE
 ch LATIN CAPITAL LETTER N WITH GRAVE
 ch LATIN CAPITAL LETTER N WITH CIRCUMFLEX
 ch LATIN CAPITAL LETTER N WITH MACRON
 ch LATIN CAPITAL LETTER N WITH VERTICAL BAR

U+1E72 LATIN CAPITAL LETTER U WITH DIAERESIS BELOW
 ch LATIN CAPITAL LETTER U WITH DIAERESIS BELOW WITH ACUTE
 ch LATIN CAPITAL LETTER U WITH DIAERESIS BELOW WITH GRAVE
 ch LATIN CAPITAL LETTER U WITH DIAERESIS BELOW WITH CIRCUMFLEX
 ch LATIN CAPITAL LETTER U WITH DIAERESIS BELOW WITH VERTICAL BAR

U+207F SUPERSCRIPT LATIN SMALL LETTER N
------------------------------------------------------------------

Thong2-ke3
==========
U7 kui2-e5 ch: 34
U7 kui2-e5 k: 13
Ch kap4 k long2-chong2 u7 kui2-e5: 47

Statistics
==========
Total number of ch's: 34
Total number of k's: 13
Total number of proposed characters: 47

-- 
hlo: TE3, Khai2-su7 | hak: TAI4, Khai3-si4
http://nanigani.caltech.edu



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT