Feedback on Confusables or Identifier Restriction Data
Please submit suggestions for confusables or identifier restrictions using the online
reporting form.
- Identifier Restrictions
- The identifier restrictions provide information about which characters should
be restricted in general-purpose identifiers, for security purposes. For example, historic scripts are a candidate for such exclusion. Removing these characters causes less confusion, and avoids the need to consider all of the possible confusable characters.
- Confusables
- Characters are confusable when they can be confused at normal font sizes in common UI fonts. While the Unicode code charts can be consulted for a representative glyph, what is more important are the glyphs that are in common UI fonts. Some of this data can be gathered mechanically, but there are many cases where human judgment is needed. On the Mac, the Character Viewer can be used to see glyphs in different fonts for a given character; for Windows and Linux, see Selection from a Screen.
- The focus of the current data for
UTS #39, Unicode Security Mechanisms is for “allowed” characters (in Restrictions). There is little data for other scripts, and the data is sparse for the Han script and scripts of South and Southest Asia, such as Devanagari. Suggestions for improvements for these and other scripts are welcome.
Data Formats
On the reporting form, you can specify characters either with hex codes or literal characters. Where you specify a string of multiple characters, please separate them by spaces. For example:
Example |
Comment |
〃 |
U+3003 DITTO MARK |
3003 |
U+3003 DITTO MARK |
a c |
The string "ac" |
0061 0063 |
The same string, with hex codes. |
For Restrictions, you can also specify a set of characters. That should use one of the following formats:
Example |
Comment |
a..c |
All of the characters between a and c, in Unicode order. |
0061..0063 |
The same range, using with hex codes. |
[:blk=Greek:]
|
An entire block, or other UnicodeSet. |