RE: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8 from Doug Ewell via Unicode on 2017-05-17 (Unicode Mail List Archive)

From: Doug Ewell via Unicode <unicode_at_unicode.org>
Date: Wed, 17 May 2017 15:31:56 -0700

Richard Wordingham wrote:

> So it was still a legal way for a non-UTF-8-compliant process!

Anything is possible if you are non-compliant. You can encode U+263A
with 9,786 FF bytes followed by a terminating FE byte and call that
"UTF-8," if you are willing to be non-compliant enough.

> Note for example that a compliant implementation of full upper-casing
> shall convert the canonically equivalent strings <U+1FB3 GREEK SMALL
> LETTER ALPHA WITH YPOGEGRAMMENI, U+0313 COMBINING COMMA ABOVE> and
> <U+1F00 GREEK SMALL LETTER ALPHA WITH PSILI, U+0345 COMBINING GREEK
> YPOGEGRAMMENI> to the canonically inequivalent strings <U+0391 GREEK
> CAPITAL LETTER ALPHA, U+0399 GREEK CAPITAL LETTER IOTA, U+0313> and
> <U+1F08 GREEK CAPITAL LETTER ALPHA WITH PSILI, 0399 GREEK CAPITAL
> LETTER IOTA>. A compliant Unicode process may not assume that this is
> the right thing to do. (Or are some compliant Unicode processes
> required to incorrectly believe that they are doing something they
> mustn't do?)

I'm afraid I don't get the analogy.

--
Doug Ewell | Thornton, CO, US | ewellic.org

Received on Wed May 17 2017 - 17:32:48 CDT

This archive was generated by hypermail 2.2.0 : Wed May 17 2017 - 17:32:48 CDT