Best practices for replacing UTF-8 overlongs

From: Karl Williamson <public_at_khwilliamson.com>
Date: Mon, 19 Dec 2016 16:04:06 -0700

It seems counterintuitive to me that the two byte sequence C0 80 should
be replaced by 2 replacement characters under best practices, or that E0
80 80 should also be replaced by 2. Each sequence was legal in early
Unicode versions, and it seems that it would be best to treat them as
each a single sequence, replacing by a single replacement character.

What are the advantages to replacing them by multiple characters
Received on Mon Dec 19 2016 - 17:04:35 CST

This message: [ Message body ]
Next message: Doug Ewell: "Re: Best practices for replacing UTF-8 overlongs"
Previous message: William_J_G Overington: "Re: Emoji as Art"
Next in thread: Doug Ewell: "Re: Best practices for replacing UTF-8 overlongs"
Maybe reply: Doug Ewell: "Re: Best practices for replacing UTF-8 overlongs"
Reply: Markus Scherer: "Re: Best practices for replacing UTF-8 overlongs"
Reply: Richard Wordingham: "Re: Best practices for replacing UTF-8 overlongs"
Maybe reply: Doug Ewell: "RE: Best practices for replacing UTF-8 overlongs"
Reply: J Decker: "Re: Best practices for replacing UTF-8 overlongs"
Maybe reply: Doug Ewell: "RE: Best practices for replacing UTF-8 overlongs"

Mail actions: [ respond to this message ] [ mail a new topic ]
Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

This archive was generated by hypermail 2.2.0 : Mon Dec 19 2016 - 17:04:37 CST