Re: On the possibility of encoding some localizable sentences in plane 7 (from Re: On the possibility of encoding webdings in Unicode)

From: Philippe Verdy (
Date: Sat Jan 29 2011 - 15:29:54 CST

  • Next message: Mark Davis ☕: "Re: On the possibility of encoding some localizable sentences in plane 7 (from Re: On the possibility of encoding webdings in Unicode)"

    Could we stop discussing this out of topic subject ? Localizable sentences
    are definitely not solving any problem by something that requires character
    encoding. This would just create a new problem in trying to inject a
    solution that even users would not perceive as meaningful, and would cause
    more communication problems than just writing the sentence in some language
    (and eventually use some automatic translator if the sentence is not
    explicitly translated in localizable ressources).
    William Overrington attempts to convince us that some sentences merit
    specific encoding, but fails to demonstrate any existing of a specific
    iconography for them, when in fact the presentation would vary a lot across
    languages to use the natural orthographies and scripts, and terminology
    (including abbreviations) in each language in a way that people would
    certainly recognize much more easily.
    Yes, there's no demonstrated use for this encoding. Before such proposal is
    ever added, I would prefer finding encodings for standardized road indicator
    symbols (many of them being international), as there's a demonstrated wide
    use for them and they are useful as well for printing in books. Then we
    would add railway indicators, national flags (even some historic ones whose
    presence are widely demonstrated in books). We would first find as well the
    various iconography used for security and standard compliance of devices,
    because they are needed in documentation and labelling of products, or the
    various symbols that appear on device keyboards (including the multimedia
    control keys, or other functional icons found now on many PCs for switching
    the WiFi/blueTooth on/off, or control the display light, or the volume
    level, or muting sound on/off, or the power sleeping mode, or muting the
    mouse pad, or muting/switching displays).

    Let's focus on existing symbols and iconography that have demonstrated use,
    not necessarily international (the iconography may be specific to some
    country or region), but at least supported by open standards, or in public
    communication, i.e. excluding personal/corporate logos as long as they
    remain constrained in their identity or by usage restrictiction such as
    "non-commercial" or "private" or "subject to prior agreement", or prior
    testing and approval, if they were not designed and advertized for use in
    the general public).

    The Emoji set has demonstrated its use and its urgent need for
    interoperability across device vendors and service operators in Japan for
    example, and elsewhere for many of them since long (such as emoticons),
    because they are part of normal communication between people for their
    private or public messaging (and they are not translatable by themselves, as
    their identity is the symbol/icons themselves that are difficult to replace
    with anything else even if they have varying shapes or colors).

    There's a wide iconography used worldwide that have demonstrated use across
    vendors in various application domains that merit encoding long before we
    ever attempt to encode localizable sentences), and that are supported by lot
    of people or organizations in their communication to convey some meaning.
    Various industries are concerned and have more merit than the William's
    proposal, even if these domains are not common for many people or require
    special training prior to recognize and use them properly: architecture,
    biotechnologies, medecine, electronic, aerospace industry, agriculture...
    Once they are encoded, of course it will be possible to use them for other
    domains, but the initial domain in which they were used will remain and will
    keep their intended meaning. There will be other creative use of these
    symbols, you'll find people making animations with them.

    But what is important is not what you MAY do with these symbols, but what
    has been effectively done with them.

    For example I still think that layout control characters would still be more
    urgent to encode for complex scripts, because the layout already has
    demonstrated use and specific meaning in existing texts, even if the current
    position of the UTC is to still wait and see (and before that, trying to use
    upper layer protocols: as soon as we'll see that interoperability is a
    problem, more controls will be added to deprecate several collections of
    upper layer protocols). The character encoding model can be adapted to
    support these layouts and will certainly require work in the standard.

    We should not be limited by the current limitations in numeric font
    technologies (OpenType, AAT/Graphite...) because even those technologies
    have evolved and integrated more complex features that were initially not
    possible (that's why now we have compatibility characters in the standard
    for things like narrow/wide variants, when those distinctions were not
    possible with older numeric font technologies). That's why we have now
    things like joiners. In some future, we'll find semantic grouping characters
    (even if they are invisible in rendered texts, they could be used to create
    a workable layout, as they are in fact similar to existing punctuation like
    parentheses). Future numeric font technologies will be able to create more
    complex layouts (just like they have been extended to support complex
    reordering and positioning in grapheme clusters).

    The argument is much more convincing here, because there's already a
    demonstrated use and need for layout distinctions, and there's still no
    interoperable solution, and the texts needing those layouts already exist.
    Input methods will be developed to create these layouts and normalize them
    in a processable/searchable encoding, starting by various PUA experiments
    and XML-like modelizations, up to the point that a standard model emerges
    for the processing (this will be the case for Egyptian hieroglyphs, or
    SignWriting whose encoding is for now extremely partial and clearly
    insufficient to create meaningful plain-texts). The font rendering
    technologies and text renderers will be adapted to support the necessary
    layouts, and will accelerate the convergence to an interoperable model that
    will be encodable because this newer model will also have demonstrated its
    usability and better compatibility/transparency within more softwares than
    the prior upper layer protocols.

    The Unicode/ISO/IEC 10646 standardization is not about the encoding of all
    semantics conveyed in texts. It's just about encoding what is just *enough*
    to permit semantic distinctions in encoded texts, using small bricks which
    are abstract characters that don't necessarily have a specific distinct
    glyph. New characters are added only on the rationale that they are required
    for semantic distinctions that are agreed upon by a supporting and open
    community (preferably composed of several generations and not led by a
    single person or organization) that has already produced many texts and
    documents for which they want better interoperability across various
    processing systems or softwares.

    Not all glyphs will be encoded (we are in the same situation here as between
    phonetic and phonology in the aural domain : we don't encode personal aural
    accents with distinct orthographies, because the communication
    interoperability is not in those accents) simply because they don't convey
    any widely demonstrated meaningful distinctions needed for interoperability.
    And the goal is not interoperability between automated systems, but between
    people writing/reading these texts (most often the interoperability between
    automated systems is solved by technical solutions and standards, and
    enhanced technologies : technologies will continue to evolve to adapt to
    what people really use or to fill the usability gaps that are still not
    covered, after various experimentations).

    The William's discussions are not doing any demonstration of actual use of
    any experiment of solutions accepted by a community. There's no community
    behind him and no need for better interoperability (but just new
    difficulties added without effective benefits), they are going to nowhere.
    If he wants to create something, he can do that just like others have done
    with emojis : start by using PUAs, implement something, and try to convince
    a community to use his solution in a productive way.

    This archive was generated by hypermail 2.1.5 : Sat Jan 29 2011 - 15:33:36 CST