RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

From: Shawn Steele via Unicode <unicode_at_unicode.org>
Date: Tue, 16 May 2017 21:15:53 +0000

> Faster ok, privided this does not break other uses, notably for random access within strings…

Either way, this is a “recommendation”. I don’t see how that can provide for not-“breaking other uses.” If it’s internal, you can do what you will, so if you need the 1:1 seeming parity, then you can do that internally. But if you’re depending on other APIs/libraries/data source/whatever, it would seem like you couldn’t count on that. (And probably shouldn’t even if it was a requirement rather than a recommendation).

I’m wary of the idea of attempting random access on a stream that is also manipulating the stream at the same time (decoding apparently).

The U+FFFD emitted by this decoding could also require a different # of bytes to reencode. Which might disrupt the presumed parity, depending on how the data access was being handled.

-Shawn
Received on Tue May 16 2017 - 16:16:23 CDT

This archive was generated by hypermail 2.2.0 : Tue May 16 2017 - 16:16:23 CDT