Re: Mixed up priorities

From: Michael Everson (everson@indigo.ie)
Date: Sun Oct 24 1999 - 06:34:24 EDT


Ar 12:25 -0700 1999-10-23, scríobh Mark E. Davis:

>Phrased in those terms, 'ch' in Slovak is a grapheme that is represented with
>2 code points; similarly, 'å' is a grapheme in Danish that is represented
>with either 1 or 2 code points, while 'ksha' is a grapheme in Hindi that is
>represented with 3 code points.

'Ksha' is a good example. In Hindi, it is ordered as a regular ligature of
the letter 'ka'. In Nepali, it is ordered as a separate letter at the end
of the alphabet.

I'm not sure about Mark's terminology 'grapheme' etc., but I was looking at
my Dictionary of !Xóõ, a Khoisan language. On page 13 is gives the alphabet
used in the dictionary. It is remarkably complex given the phonetic reality
of the language and the relative poverty of the Latin script. Here is the
alphabet, given in alphabetical order as found in the dictionary.

* [which I use for the bilabial click here]
*g
*x
g*x
*kx'
g*kx'
*q
*G
*qh
g*qh
*q'
*h

*n
'*n
*'
|
|g
|x
g|x
|kx'
g|kx'
|q
|G
|qh
g|qh
G|qh
|q'
|h

|n
'|n
|'
|
!g
!x
g!x
!kx'
g!kx'
!q
!G
!qh
g!qh
G!qh
!q'
!h

!n
'!n
!'
!
||g
||x
g||x
||kx'
g||kx'
||q
||G
||qh
g||qh
G||qh
||q'
||h
||ñ
||n
'||n
||'
‚g
‚x
g‚x
‚kx'
g‚kx'
‚q
‚G
‚qh
g‚qh
G‚qh
‚q'
‚h
‚ñ
‚n
'‚n
‚'

p
b
ph
p'kx'

t
d
tx
dtx
th
dth
t'
t'kx'
dt'kx'
ts
dz
tsh
dtsh
tshx
dtshx
ts'
ts'kx'
dts'kx'
k
g
k
kh
gkh
kx'
gkx'
k'
q
G
qh
Gqh
q'
f
s
x
h
l
m
'm
n
'n
a
e
i
o
u
'a
'e
'i
'o
'u

If <ch> were considered to be a single character in the UCS, it would imply
that all of these things here should be considered for addition (in
lower-case, title-case, and upper-case) on the same grounds.

--
Michael Everson * Everson Gunn Teoranta * http://www.indigo.ie/egt
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Guthán: +353 1 478 2597 ** Facsa: +353 1 478 2597 (by arrangement)
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT