Re: Normalisation and Greek characters

From: John Cowan (cowan@mercury.ccil.org)
Date: Sat Mar 15 2003 - 14:05:43 EST

  • Next message: Mark Davis: "Re: Normalisation and Greek characters"

    David J. Perry scripsit:

    > U+03AC and U+1F71 both have canonical decompositions to U+03B1 followed
    > by U+0301. (There are other similar pairs in the Greek blocks.) If an
    > application applies normalisation form C both decompose to the same
    > string; will the resulting recomposed character be 03AC or 1F71? I
    > suspect the former, but I'd like to know if this is correct and if so,
    > how this is determined.

    U+1F71 is what is called a singleton, having a single-character canonical
    decomposition, which means that it is not used when recomposing.
    Such characters are essentially clones that arrived in Unicode either
    for roundtrippability or (in this case) because of a misunderstanding,
    namely the belief that TONOS and OXIA were distinct accents.

    -- 
    John Cowan           http://www.ccil.org/~cowan              cowan@ccil.org
    To say that Bilbo's breath was taken away is no description at all.  There
    are no words left to express his staggerment, since Men changed the language
    that they learned of elves in the days when all the world was wonderful.
            --_The Hobbit_
    


    This archive was generated by hypermail 2.1.5 : Sat Mar 15 2003 - 14:37:42 EST