Re: [WhatsApp Support] Your Request: Windows Phone Client 2.10.523(ticket #7044796) from Philippe Verdy on 2013-08-05 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Mon, 5 Aug 2013 15:21:53 +0200

I don't know. I've not followed much the ongoing work on Japanese emoji.
But I know that it contains a few flags in Japanese systems (I'm not sure
they were or will be encoded in the UCS or if regional indicator characters
will be substitutes for the proposed Emojis if they are not encoded).

The way I perceive the regional indicators (in Uncode 6.0), they are
absolutely not used and will be never used at all as long as there are no
complements such as the minimum brackets I suggest to fix them. The 26
letter-like characters are basically broken in their identity, you can't
safely align multiple flags or delimit them with break iterators, like you
can break words, paragraphs, syllables (in some languages this is difficult
as it is contextual too, but not impossible, and in many languages you can
find syllabel breaks without having to parse backward on indefinite length)
or lines.

It is evitent that a single flag is normally an unbreakable cluster and
nothing in the current encoding allows defining cluster boundaries when you
put multiple flags side by side (of course you could separate flags with
additionally encoded spaces or punctuations, but I don't see why we should
have to do it)
When not using graphic flags, do we really (1) write and read
"FRGFGPMQREYTPFWF" (sic!), or (2) "[FR][GF][GP][MQ][RE][YT][PF][WF]" or at
least "FR;GF;GP;MQ;RE;YT;PF;WF" ? Of course the text makes only sense and
will wrap on lines cleanly if we use the simple second solution which uses
explicit separators between sequences of letters that are all the same
type. The first solution makes no sense as we have to "guess" that these
letters will compose by pairs, and we will have to count them from the
begining. If we have about 30 country codes aligned or more (example the
flags of countries in the European Union, or NATO, or in the Council of
Europe, or participants to an international sport event) it will not work.

Note also that for sport events we need more than just country flags (how
would you differentiate England vs. Scotland for example in international
Rugby competitions, or an international team using the Olympic white flag)?
You need more codes than just ISO 3166-1, and sequences will be longer than
2 letters.

My opinion is that it's not the job of Unicode to define which codes will
be used (ISO 3166-1 or anything else), just like it's not its job to define
orthographies ; even the ISO 3166 standard may be amended later to include
more codes than just two letters (or to accept other letters than just
basic Latin). We DO need delimiters encoded in the UCS for use within
regional indicators only to create full clusters for enabling their correct
substition by icons, and without inserting any other separate clusters,
exactly the same way we can align graphic icons.

As long as this will not be possible, documents will still use some
upper-layer rich-text format with its parsed syntax and embedding rules, to
reference external images by location (URL) or by name/identifier (URN or
code), both of which requiring a decoder and lexical analyser (for
separating embedded elements) and a syntaxic parser to differentiate them
by type, and and external resolver to retrieve an associated graphic to
insert in the same stream a the one used on ouput by the plain-text
renderer. The reional indicators were supposed to eliminate these extra
syntax and components but it does not work.

In fact it would have been gully enough to encode *only* the two REGIONAL
INDICATOR START/END brackets (used between existing ASCII letters, digits
and punctuation, except whitespaces and paired punctuations) to allow
renderers to perform special substitition of each fully bracketed by a
graphic icon or glyph. Immediately, we would have coded Scotland with
<REGIONAL INDICATOR START ; BASIC LATIN "G", "B"; "-", "S", "C" ; REGIONAL
INDICATOR START END>.

For regions not encoded within ISO 3166-1 it was enough to start the
embedded code with "-" followed by some prefix, just like with CSS private
extensions, or by inserting an URL directly (starting by "htpp:" or
"https:" or other URL schemes for local attachments in envelope formats
like MIME) or an URN (starting by "urn:" or "uuid:").When using an URN or
URL, it does not necessarily designates the location of the glyph or icon
data or its format, the remote location accessed by the resolver will
report the appropriate format according to the format supported by the
client (in HTTP we have "Accept:" headers and MIME resource types for that
purpose, independanthly of the URL used).

2013/8/5 Christopher Fynn <chris.fynn_at_gmail.com>

> Since the original JapaneseEmoji contained some country flags - are
> these now being represented by Unicode REGIONAL INDICATOR characters?
> Is there any working mplementation where pairs of these characters
> are displayed as flags or some other country indicator?
>
Received on Mon Aug 05 2013 - 08:26:44 CDT

This archive was generated by hypermail 2.2.0 : Mon Aug 05 2013 - 08:26:46 CDT