Re: Flag tags (was: Re: Unicode 6.2 to Support the Turkish Lira Sign) from Philippe Verdy on 2012-05-31 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Fri, 1 Jun 2012 00:14:29 +0200

Here he probably meant that if we need to encode many flags, each flag
code may be arbitrarily long. A solution based on combining characters
will not work correctly, and it will be better to use leading and
traling markers, or to use a codification that allows knowing where a
flag starts and where it finishes.

There are two solutions:

(1) use specific punctuation-like characters acting like brackets
(those brackets can be given also a visual glyph by themselves), and
encode the intermediate flag code using usual characters. This would
allow viable fallback representations of flags, even if they show the
codes (as letters will be encloded, for reasability, the set should be
restricted and probably only uppercase, so that letters can be reduced
easily within the enclosing sym

(2) restrict the subset of characters that are usable in flag
identification codes to a useful and productive subset of ASCII, then
reencode them as enclosed letters marking the start and end of the
code, as well as eventual medial codes. This eases the production of
fonts for a reasonnable representation of these codes within a visual
band looking like a flag, as well as allows those sequences to ve
easily converted into ligatures for showing the actual flags
(including with their colors if needed).

Your solution based on SWSP *separator* does not solve anything, it
does not clearly indicates that this is representing a flag, and will
not allow automated recognition and production of ligatures.

2012/5/31 Andrew West <andrewcwest_at_gmail.com>:
> On 31 May 2012 00:24, Mark Davis ☕ <mark_at_macchiato.com> wrote:
>>
>> There is definitely a problem.
>
> Is it really such a problem? Why can't implementations simply use
> ZWSP to demarcate the 2-character units in a sequence of more than two
> regional indicator symbols (and maybe always emit 2-character codes
> wrapped between ZWSP on either side to be safe), so for example
> US<ZWSP>ES<ZWSP>GE would be parsed as the regional indicator symbols
> for USA, SPAIN and Georgia, whereas U<ZWSP>SE<ZWSP>SG<ZWSP>E would be
> parsed as the regional indicator symbols for U (invalid), Sweden,
> Singapore and E (invalid). Algorithms such as line-breaking would not
> break between two regional indicator symbols, but only at a ZWSP.
>
> And if implementations wanted to support two- and three-letter
> regional codes, they might parse
> <ZWSP>GB<ZWSP>CYM<ZWSP>ENG<ZWSP>NIR<ZWSP>SCO<ZWSP> as the codes for
> United Kingdom, Wales, England, Northern Ireland, and Scotland, and
> represent them visually with the appropriate flag icons.
>
> Andrew
>
>
Received on Thu May 31 2012 - 17:16:46 CDT

This archive was generated by hypermail 2.2.0 : Thu May 31 2012 - 17:16:46 CDT