Unicode Frequently Asked Questions

Language Tagging

Q: Do I always need to tag text with the language?

No, in most cases it is not necessary. For a more complete discussion, see Section 5.10, Language Information in Plain Text in The Unicode Standard.

Q: What are language tag characters?

The Unicode Standard contains a set of invisible format control characters, also known as "tag characters". These tag characters can be used in sequences, introduced by U+E0001 LANGUAGE TAG and terminated by U+E007F CANCEL TAG, to spell out language tags that can be embedded into Unicode plain text. See Section 23.9, Tag Characters in The Unicode Standard for a complete explanation.

Q: Should I be using tag characters to spell out language tags?

No. Use of tag characters to spell out language tags for embedding in plain text is strongly discouraged, and U+E0001 LANGUAGE TAG and U+E007F CANCEL TAG are deprecated. They are encoded in the standard only for limited use by particular protocols which may need to provide language tagging for short strings, without the use of full-fledged markup mechanisms. Most other users who need to tag text with the language identity should be using standard markup mechanisms, such as those provided by HTML, XML, or other rich text mechanisms. In database contexts, language should generally be indicated by appropriate data fields, rather than by embedded language tags or markup.