Re: Emoji: emoticons vs. literacy

From: verdy_p (verdy_p@wanadoo.fr)
Date: Wed Jan 14 2009 - 03:15:36 CST

  • Next message: Ruszlan Gaszanov: "RE: Emoji: emoticons vs. literacy"

    "Doug Ewell" wrote:
    > Michael D'Errico wrote:
    >
    > > The thing I like is that it only requires 26 code point assignments,
    > > yet has the ability to represent the equivalent XML: CA
    > > in plain text. The code points themselves carry with them the
    > > "flag-ness", so this information is available even to a plain-text
    > > process. If two code points were not enough to specify every country
    > > or area, as was suggested for CYM, then three or more code points
    > > could be used to accommodate them (with no additional assignments).
    >
    > It looks too much like the UTF-16 surrogate model, complete with invalid
    > sequences, which is an acceptable model for a multi-code-unit character
    > encoding but quite cumbersome for defining individual characters.
    > What's more, allowing sequences of "three of more code points" would
    > catapult this model well beyond UTF-16 in terms of complexity.

    Not really: for me what he proposes looks like if these few codepoints were encoding a new script, with which you are writing words in a specific language/vocabulary and orthography to designate the name of a flag. If these were surrogates, there would be 26 codes for the first character, and 26 codes for the second one, and these codes would have to be matched contextually.

    For me, it's then not much different from just allocating only surrounding punctuation signs (similar to parentheses) to surround a flag name, except that it would not require contextual analysis for correct represetation (or fallback). In that case, no flag is explicitly encoded, there's no defined semantic for the characters making the flag identifier, there's no need to define any registry of flags in Unicode itself (the list of defined flag identifiers can be part of another project made by vexillologists), the only thing that is encoded is that there's a flag within the surrounded sequence. It's also quite similar to using rich-text syntaxes like HTML or XML (that can also surround such names), so the conversion to rich-text formats can be facilited.

    Then, the names themselves do not need special encoding and can reuse the ASCII set (and there's no restriction on the character set to use in flag identifiers (except that it should better match the syntax for identifiers).

    So the effective encoding would be like "[CA]", where just the square brackets are encoded specifically to give the intended semantics to the "CA" substring. If the flag cannot be rendered in a defining font or renderer, the renderer could still use a fallback using other brackets or symbols. The representative glyphs for such brackets could be similar to those for Egyptian Hieroglyph cartouches: such as the two halves of a flying flag, each one within a surrounding dotted square (that will disappear when the flag will be connected and filled with the actual flag in renderers supporting the named flags).

    Under this simple scheme (only two character needed and allocated), you are not restricted to the too limited ISO 3166-1 code set, you can use any vexillologist standards that may exist (e.g. the flag names used to index flags on the "Flags of The World" website and its mirrors), or any descriptive format (that could use a more complex syntax recognized by the flags renderer, such as embedding a SVG document or SVG document fragment directly within the characters pair).

    Then if you view this sequence in a plain-text only renderer, you will just see these two special flag brackets, surrounding either the flag identifier (that a flags-capable renderer will convert using its own local database of flag graphics) or its SVG definition document (itself between <svg>...</svg> or <svg use="#id"/> because it will have only one root element, if the flags-enabled renderer supports SVG), or the URL to an external SVG definition. A web browser (or modern word processors that support HTML) will support this rendering without difficulty and can support a builtin (and extensible) library of common flag identifiers for ISO 3166-1 at least.

    My opinion is that Unicode should not attempt to encode any regional flag as long as there's not been long discussions with the many vexillologists that exist around the world and that have defined their own collections, or made significant works on the standardization of their libraries. Unicode will also avoid like this all legal issues related to the usage or reference to restricted flags.



    This archive was generated by hypermail 2.1.5 : Wed Jan 14 2009 - 03:19:29 CST