Re: Nicest UTF

Date: Wed Dec 08 2004

    Marcin asked:

    > The general trouble is that numeric character references can only
    > encode individual code points

    By design.

    > rather than graphemes (is this a correct
    > term for a non-combining code point with a sequence of combining code
    > points?).

    No. The correct term is "combining character sequence".

    TUS 4.0, p. 70, D17.

    The correct NCR representation of a combining character sequence
    is a sequence of NCR's. -- Not too surprisingly.


    > So if XML is supposed to be treated as a sequence of
    > graphemes, weird effects arise in the above boundary cases...

