Re: Utility to report and repair broken surrogate pairs in UTF-16 text

From: Doug Ewell (
Date: Fri Nov 05 2010 - 17:22:27 CST

    Markus Scherer wrote:

    >> Right, but as I said, those downstream tasks shouldn't be consumers
    >> of UTF-16 code units anyway. They should be consumers of Unicode
    >> code points, which by definition excludes loose surrogates.
    > Code points include surrogates. Maybe you mean "UTF-32 code units" or
    > "Unicode scalar values".

    You're right, I meant Unicode scalar values.

    I don't see the difference between allowing loose UTF-16 code units in
    what purports to be a character stream and allowing loose UTF-8 code

