Language Tagging
Q: Do I always need to tag text with the
language?
A: No, in most cases it is not necessary. For a more
complete discussion, see Section 5.10,
Language Information in Plain Text
in The Unicode Standard Version 6.0.
Q: What are language tag characters?
A: The Unicode Standard contains a set of invisible format
control characters, also known as "Plane 14 Language Tags", which can be
used to spell out language tags that can be embedded into plain text.
See Section 16.9,
Deprecated Tag Characters in
The Unicode Standard Version 6.0 for a complete
explanation.
Q: Should I be using those language tag characters?
A: No. Use of the language tag characters is strongly
discouraged. They are encoded in the standard only for limited
use by particular protocols which may need to provide language tagging
for short strings, without the use of full-fledged markup mechanisms.
Most other users who need to tag text with the language identity should
be using standard markup mechanisms, such as those provided by HTML,
XML, or other rich text mechanisms. In database contexts, language
should generally be indicated by appropriate data fields, rather than by
embedded language tags or markup.
|