From: John Cowan (cowan@mercury.ccil.org)
Date: Sat Mar 15 2003 - 14:05:43 EST
David J. Perry scripsit:
> U+03AC and U+1F71 both have canonical decompositions to U+03B1 followed
> by U+0301. (There are other similar pairs in the Greek blocks.) If an
> application applies normalisation form C both decompose to the same
> string; will the resulting recomposed character be 03AC or 1F71? I
> suspect the former, but I'd like to know if this is correct and if so,
> how this is determined.
U+1F71 is what is called a singleton, having a single-character canonical
decomposition, which means that it is not used when recomposing.
Such characters are essentially clones that arrived in Unicode either
for roundtrippability or (in this case) because of a misunderstanding,
namely the belief that TONOS and OXIA were distinct accents.
-- John Cowan http://www.ccil.org/~cowan cowan@ccil.org To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_
This archive was generated by hypermail 2.1.5 : Sat Mar 15 2003 - 14:37:42 EST