Re: Nicest UTF

From: Kenneth Whistler (
Date: Wed Dec 08 2004 - 19:23:11 CST

  • Next message: Kenneth Whistler: "RE: Invalid UTF-8 sequences (was: Re: Nicest UTF)"

    Marcin asked:

    > The general trouble is that numeric character references can only
    > encode individual code points

    By design.

    > rather than graphemes (is this a correct
    > term for a non-combining code point with a sequence of combining code
    > points?).

    No. The correct term is "combining character sequence".

    TUS 4.0, p. 70, D17.

    The correct NCR representation of a combining character sequence
    is a sequence of NCR's. -- Not too surprisingly.


    > So if XML is supposed to be treated as a sequence of
    > graphemes, weird effects arise in the above boundary cases...

    This archive was generated by hypermail 2.1.5 : Wed Dec 08 2004 - 19:28:21 CST