[Unicode]  Frequently Asked Questions Home | Site Map | Search

Security Issues FAQ

Q: I've heard claims that Unicode poses security issues. Is that right?

A: A common security issue is 'spoofing', the deliberate misspelling of a domain or user name to trick unaware users into entering an interaction with a hostile site as if it was a trusted site. To be effective, spoofing can be very approximate, e.g. using the digit '1' instead of the letter 'l'. The Unicode Standard contains many "confusables," that is, characters whose glyphs, due to historical derivation or sheer coincidence, resemble each other more or less closely. Certain security-sensitive applications or systems may be vulnerable due to possible misinterpretation of these confusables by their users. [AF] & [DE]

Q: Is this a problem that is unique to Unicode?

A: No, many legacy character sets, including ISO/IEC 8859-1, also contain confusables (albeit usually fewer of them) and carry the same risks when it comes to spoofing. [AF] & [DE]

Q: Why is it not simply possible to give all characters that use the same glyph a single code?

A: Unicode encodes characters, not glyphs. By unifying an encoding based strictly on appearance, many common text processing tasks would become convoluted or impossible. For example, Latin B and Greek Beta (Β) look the same in most fonts, but lower-case to two different letters, Latin b and Greek beta (β), which have very distinct appearance. [AF] & [DE]

Q: Where can I find out more about security issues with Unicode and globalization software?

A: For general explanations of issues and recommended approaches, see UTR #36, Unicode Security Considerations.
For recommended mechanisms (and data for implementing them) for handling certain security issues, see UTS #39, Unicode Security Mechanisms

Q: Where can I find out about security issues connected with Internationalized Domain Names (IDNs)?

A: See the Internationalized Domain Names (IDN) FAQ and UTS #46, Unicode IDNA Compatibility Processing.