Re: Utility to report and repair broken surrogate pairs in UTF-16 text

From: Doug Ewell (doug@ewellic.org)
Date: Fri Nov 05 2010 - 17:22:27 CST

Next message: Mark Davis ☕: "Re: [icu-support] Semantic issues with case-insensitive regex matching"

Previous message: Markus Scherer: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
In reply to: Markus Scherer: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Markus Scherer wrote:

>> Right, but as I said, those downstream tasks shouldn't be consumers
>> of UTF-16 code units anyway. They should be consumers of Unicode
>> code points, which by definition excludes loose surrogates.
>
> Code points include surrogates. Maybe you mean "UTF-32 code units" or
> "Unicode scalar values".

You're right, I meant Unicode scalar values.

I don't see the difference between allowing loose UTF-16 code units in
what purports to be a character stream and allowing loose UTF-8 code
units.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s

Next message: Mark Davis ☕: "Re: [icu-support] Semantic issues with case-insensitive regex matching"
Previous message: Markus Scherer: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
In reply to: Markus Scherer: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Nov 05 2010 - 17:26:29 CST