RE: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))

From: Peter Constable <>
Date: Fri, 1 Jun 2012 18:21:27 +0000

From: [] On Behalf Of William_J_G Overington

> Thinking about this after posting and thinking of the vast coding space
> that could be opened up for flag encoding by just adding U+E0002 into
> regular Unicode...

I'd suggest you _don't_ continue thinking in this vein. The entire approach of defining character sequences that, _independent of some particular application context_ , represent non-text entities is not a good idea. If, say, one were to define an XML language for representing some time of information that involved flags (or whatever) and, as part of that, defined some set of sequences to be used as entity references, then that would be fine. But plane 14 tag characters would be neither necessary or even recommended for defining such entities (there are well-established conventions for creating named entity references in XML). And defining those entities free of any such application context would be bad because the only remaining context for maintaining such entity references would be Unicode itself, and that's way out of scope.

You might wonder, "But isn't that what Unicode did in encoding the regional identifier characters?" If so, the answer is, "No." Note that all Unicode did was to encode a set of characters; it did not define any sequences. The only requirement of Unicode was to provide a way to map Shift-JIS encoded text involving emoji to Unicode / 10646 in a way that could be round-tripped, and the regional identifier characters was the approach that all parties could agree upon--with one of the big concerns among at least some of those parties being _not_ to start defining character sequences to represent flags (or any other entities) within the Unicode or ISO 10646 standards.

Received on Fri Jun 01 2012 - 13:24:06 CDT

This archive was generated by hypermail 2.2.0 : Fri Jun 01 2012 - 13:24:06 CDT