Re: Tags and future new technologies (from RE: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign))

From: Philippe Verdy <>
Date: Fri, 1 Jun 2012 18:24:05 +0200

Note that I absolutely do not advocate the reuse of language tags for
something else. They are deprecated and should remain deprecated. They
were not intended to be visible symbols.

I much prefer a solution that generates **true** symbols that can be
combined, and **optionally** (but safely) rendered as ligatures (by
design of the encoding itself) to render the true flags instead of
showing their code in the list of glyphs (the default rendering in
absence of recongnized ligatures).

The ligature-based solution can still be disabled to show the symbols
using a single ZWNJ format control in the middle of the sequence, but
this is for limited use. It is expected that these sequences of
symbols **should** be rendered as ligatures by default each time these
ligatures are recognized, i.e. when they match a flag code that has
been registered somewhere (in a separate registry which is not
immediately necessary for the encoding of this subset).

This new small subset should be trated as a new separate script, which
is definitely NOT Latin, as it will not support most other assumptions
and features of the Latin script, and it must not be treated at the
same level as the other surrounding Latin letters). Encoded sequences
are not breakable in the middle for word-breaking purpose.

In a limited plain-text environment, these codes could be rendered or
converted in a lossy way by remapping these symbols to the Basic Latin
block, surrounding them with punctuations like in [US] but it will be
only a last chance fallback.

This last-chance fallback conversion may be specified with a NFKC
decomposition mapping. For example this <font> compatibility mapping :

 XXX00 ; FLAG SYMBOL INITIAL HYPHEN ; ... ; So ; ... ; <font>005B 002D ;
 XXX01 ; FLAG SYMBOL INITIAL A ; ... ; So ; ... ; <font>005B 0041 ;
 XXX1A ; FLAG SYMBOL INITIAL Z ; ... ; So ; ... ; <font>005B 005A ;
 XXX20 ; FLAG SYMBOL INITIAL ZERO ; ... ; So ; ... ; <font>005B 0030 ;
 XXX29 ; FLAG SYMBOL INITIAL NINE ; ... ; So ; ... ; <font>005B 0039 ;
 XXX30 ; FLAG SYMBOL MEDIAL HYPHEN ; ... ; So ; ... ; <font>002D ;
 XXX31 ; FLAG SYMBOL MEDIAL A ; ... ; So ; ... ; <font>0041 ;
 XXX4A ; FLAG SYMBOL MEDIAL Z ; ... ; So ; ... ; <font>005A ;
 XXX50 ; FLAG SYMBOL MEDIAL ZERO ; ... ; So ; ... ; <font>0030 ;
 XXX59 ; FLAG SYMBOL MEDIAL NINE ; ... ; So ; ... ; <font>0039 ;
 XXX60 ; FLAG SYMBOL FINAL HYPHEN ; ... ; So ; ... ; <font>002D ;
 XXX61 ; FLAG SYMBOL FINAL A ; ... ; So ; ... ; <font>0041 005D ;
 XXX7A ; FLAG SYMBOL FINAL Z ; ... ; So ; ... ; <font>005A 005D ;
 XXX80 ; FLAG SYMBOL FINAL ZERO ; ... ; So ; ... ; <font>0030 005D ;
 XXX89 ; FLAG SYMBOL FINAL NINE ; ... ; So ; ... ; <font>0039 005D ;

(this also gives an hint for how to collate these symbols, and the
minimum size of the block to encode : 3 columns for each of the 3
subsets, including some code points reserved in each subsets for
additional punctuation-like symbols that may be needed to implement
namespaces in the registry of flags)

2012/6/1 William_J_G Overington <>:
> On Thursday 31 May 2012, Doug Ewell <> wrote:
>> William_J_G Overington <wjgo underscore 10009 at btinternet dot com> wrote:
>> > Further to that point of order, is there any rule that absolutely prevents the deprecated status of a character or collection of characters being removed?
>> UTC has not ever shown the slightest inclination to do so, if that answers your question.
> Thank you for replying.
> What I was wondering about was whether if someone proposes U+E0002 for encoding for a future new technology, whether the fact that tags are currently deprecated would automatically stop that proposal being accepted for encoding because of perhaps some guarantee in the rules never to reverse deprecation or something like that.
>> > I feel that by hybridizing the suggestions of Doug and Philippe that an elegant solution using tags and an advanced format font could be designed.
> Thinking about this after posting and thinking of the vast coding space that could be opened up for flag encoding by just adding U+E0002 into regular Unicode, I began to think of the possibility of proposing the addition of U+E0007 so as to open up another encoding space where each item in that encoding space could be displayed either as a sequence of tag glyphs using an ordinary font, or displayed as one glyph by using glyph substitution technology with an advanced format font or displayed localized using a database technology with the item in that encoding space used as a key to the database.
> I was thinking that the above would involve visible glyphs for the tag characters.
> I was thinking of the possibilities, then I noticed something.
> In a later post Philippe Verdy wrote as follows.
>> .... (or in Place 14, but that plane is not intended for visible symbols).
> Ah!
> There is a font that has visible glyphs for the tag characters, together with a visible glyph for a Private Use Area tag-style character at U+FFFF2 available as a free download from the following forum post.
> William Overington
> 1 June 2012
Received on Fri Jun 01 2012 - 11:29:59 CDT

This archive was generated by hypermail 2.2.0 : Fri Jun 01 2012 - 11:30:00 CDT