Re: NFC/NFKC Normalization Edge Case

From: Jeff Senn (
Date: Wed Sep 23 2009 - 08:24:00 CDT

  • Next message: "[Unicode Announcement] Remote Access Registration Now Offered at 33rd Internationalization & Unicode Conference"

    On Sep 22, 2009, at 5:44 PM, Kenneth Whistler wrote:
    >> All of these characters have combining class 0. Can they be
    >> canonically
    >> combined? Even though the 2nd characters are NOT "combining"?
    > There's the first mistake. Both of the 2nd characters in
    > these sequences *ARE* combining:

    Thanks guys. Clear now.

    I was led astray by:

    "D1. A character S is a starter if it has a combining class of zero in
    the Unicode Character Database. Any other character is a non-starter."


    "In some implementations, people may be working with streaming
    interfaces that read and write small amounts at a time. In those
    implementations, the text back to the last starter needs to be
    buffered. Whenever a second starter would be added to that buffer, the
    buffer can be flushed."

    Which are technically correct (once you know the correct answer),
    but led me to suspect I could do slightly more aggressive flushing of
    a streamed buffer...

    This archive was generated by hypermail 2.1.5 : Wed Sep 23 2009 - 08:28:27 CDT