Language Tagging
Q: Do I always need to tag text with the
language?
A: No, in most cases it is not necessary. For a more
complete discussion, see Section 5.10,
Language Information in Plain Text
in The Unicode Standard.
Q: What are language tag characters?
A: The Unicode Standard contains a set of invisible format control characters, also known as "tag characters". These tag characters can be used in sequences, introduced by U+E0001 LANGUAGE TAG and terminated by U+E007F CANCEL TAG, to spell out language tags that can be embedded into Unicode plain text. See Section 23.9, Tag Characters in The Unicode Standard for a complete explanation.
Q: Should I be using tag characters to spell out language tags?
A: No. Use of tag characters to spell out language tags for embedding in plain text is strongly discouraged, and U+E0001 LANGUAGE TAG and U+E007F CANCEL TAG are deprecated. They are encoded in the standard only for limited
use by particular protocols which may need to provide language tagging
for short strings, without the use of full-fledged markup mechanisms.
Most other users who need to tag text with the language identity should
be using standard markup mechanisms, such as those provided by HTML,
XML, or other rich text mechanisms. In database contexts, language
should generally be indicated by appropriate data fields, rather than by
embedded language tags or markup.
|