Re: full-width Latin missing from confusables data

From: Mark Davis ☕ <>
Date: Mon, 14 Oct 2013 09:40:48 +0200

For the confusables, the presumption is that implementations have already
either normalized the input to NFKC or have rejected input that is not

More broadly, in gathering data the main emphasis is on characters that fit
the profile in,
including scripts like Cyrillic ( So while
we do add characters outside of that, there has been no concerted effort to
do so.

In particular, in your identifiers you should not allow scripts like
Buginese (
Lisu (
without recognizing that the confusable data will be sketchy for those.

It would probably be worth clarifying this in the text of There is an
upcoming UTC meeting at the start of Nov., so if you want to suggest that
or any other improvements, you should use the

Mark <>
*— Il meglio è l’inimico del bene —*

On Sun, Oct 13, 2013 at 7:36 PM, Chris Weber <> wrote:

> While looking closer at the current confusables data, I've noticed that
> several of the fullwidth code points seem to be missing from the
> confusables data. For example, U+FF4D FULLWIDTH LATIN SMALL LETTER M
> does not exist as a confusable for U+006D LATIN SMALL LETTER M, as well
> as several others I've noticed.
> Was this intentional?
> Also, I'm not clear on the difference between the confusables.txt and
> confusablesSummary.txt - are these meant to provide the same data in
> different formats?
> --
> Best regards,
> Chris Weber - -
> PGP: F18B 2F5D ED81 B30C 58F8 3E49 3D21 FD57 F04B BCF7
Received on Mon Oct 14 2013 - 02:44:00 CDT

This archive was generated by hypermail 2.2.0 : Mon Oct 14 2013 - 02:44:02 CDT