Best practices for replacing UTF-8 overlongs

From: Karl Williamson <>
Date: Mon, 19 Dec 2016 16:04:06 -0700

It seems counterintuitive to me that the two byte sequence C0 80 should
be replaced by 2 replacement characters under best practices, or that E0
80 80 should also be replaced by 2. Each sequence was legal in early
Unicode versions, and it seems that it would be best to treat them as
each a single sequence, replacing by a single replacement character.

What are the advantages to replacing them by multiple characters
Received on Mon Dec 19 2016 - 17:04:35 CST

This archive was generated by hypermail 2.2.0 : Mon Dec 19 2016 - 17:04:37 CST