Re: Klingons and their allies - Beyond 17 planes

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Oct 20 2003 - 09:10:46 CST


From: "John Cowan" <cowan@mercury.ccil.org>

> Jill Ramonsky scripsit:
>
> > So, if I have understood this correctly (which is by no means certain),
> > these tag characters were added to Unicode in the vague hope that some
> > people might one day start using them, or on the off-chance that someone
> > might one day need them.
>
> Not.
>
> They were added in order to ward off an abuse of UTF-8 by a certain
> committee that insisted it needed lightweight language tagging in
> a certain computer protocol. The tags were never a "script". Everyone
> on the UTC sincerely hopes, I believe, that they never get used at all.
> For 99.9% of all use cases, ordinary markup is the Right Thing for
> language tagging.

I also approve the fact that "language" tags are not needed for Unicode,
(else it woumd mean that the text they surround must be treated specially
for a specific language, with distinct character properties, clustering,
rendering and so on, so that the text remains legible; this fact would
then break the unification model).

However I think it's a good idea to have qualifying "script" tags in areas
that Unicode will not regulate: PUAs. This allows adding a semantic to
them and effectively can close the gap for their correct interpretation,
notably when Unicode text with PUAs from various sources are merged
in a single document: these PUAs can then be interpreted correctly and
less ambiguously within their context.

This also means that, in this case, PUAs would be effectively usable
and interchangeable between systems using distinct PUA conventions
without needing extra planes. All that remains is then describing
which script tags can be used, how they should be coded, and if a
registry (like the IANA charsets database) should be preferably used
when this registry contains charmaps and assignments to these PUAs.



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST