Security Issues FAQ
Q: I've heard claims that Unicode poses
security issues. Is that right?
A: A common security issue is 'spoofing', the deliberate
misspelling of a domain or user name to trick unaware users into
entering an interaction with a hostile site as if it was a trusted site.
To be effective, spoofing can be very approximate, e.g. using the digit
'1' instead of the letter 'l'. The Unicode Standard contains many
"confusables," that is, characters whose glyphs, due to historical
derivation or sheer coincidence, resemble each other more or less
closely. Certain security-sensitive applications or systems may be
vulnerable due to possible misinterpretation of these confusables by
their users. [AF] & [DE]
Q: Is this a problem that is unique to
Unicode?
A: No, many legacy character sets, including ISO/IEC
8859-1, also contain confusables (albeit usually fewer of them) and
carry the same risks when it comes to spoofing.
[AF] & [DE]
Q: Why is it not simply possible to give
all characters that use the same glyph a single code?
A: Unicode encodes characters, not glyphs. By unifying an
encoding based strictly on appearance, many common text processing tasks
would become convoluted or impossible. For example, Latin B and
Greek Beta (Β) look the same in most fonts, but lower-case to two
different letters, Latin b and Greek beta (β), which have very
distinct appearance. [AF] & [DE]
Q: Where can I find out more about security issues with Unicode and globalization software?
A: For general explanations of issues and recommended approaches, see UTR #36, Unicode Security Considerations.
For recommended mechanisms (and data for implementing them) for handling certain security issues, see UTS #39, Unicode Security Mechanisms
Q: Where can I find out about security issues connected with Internationalized Domain Names (IDNs)?
A: See the Internationalized Domain Names (IDN) FAQ and UTS #46, Unicode IDNA Compatibility Processing.