Re: On the possibility of encoding some localizable sentences in plane 7 (from Re: On the possibility of encoding webdings in Unicode)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Jan 29 2011 - 15:29:54 CST

Next message: Mark Davis ☕: "Re: On the possibility of encoding some localizable sentences in plane 7 (from Re: On the possibility of encoding webdings in Unicode)"

Previous message: Asmus Freytag: "Re: Rejected decimal separator (former Re: On the possibility of encoding some localizable sentences in plane 7)"
In reply to: Asmus Freytag: "Re: On the possibility of encoding some localizable sentences in plane 7 (from Re: On the possibility of encoding webdings in Unicode)"
Next in thread: Mark Davis ☕: "Re: On the possibility of encoding some localizable sentences in plane 7 (from Re: On the possibility of encoding webdings in Unicode)"
Reply: Mark Davis ☕: "Re: On the possibility of encoding some localizable sentences in plane 7 (from Re: On the possibility of encoding webdings in Unicode)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Could we stop discussing this out of topic subject ? Localizable sentences
are definitely not solving any problem by something that requires character
encoding. This would just create a new problem in trying to inject a
solution that even users would not perceive as meaningful, and would cause
more communication problems than just writing the sentence in some language
(and eventually use some automatic translator if the sentence is not
explicitly translated in localizable ressources).
William Overrington attempts to convince us that some sentences merit
specific encoding, but fails to demonstrate any existing of a specific
iconography for them, when in fact the presentation would vary a lot across
languages to use the natural orthographies and scripts, and terminology
(including abbreviations) in each language in a way that people would
certainly recognize much more easily.
Yes, there's no demonstrated use for this encoding. Before such proposal is
ever added, I would prefer finding encodings for standardized road indicator
symbols (many of them being international), as there's a demonstrated wide
use for them and they are useful as well for printing in books. Then we
would add railway indicators, national flags (even some historic ones whose
presence are widely demonstrated in books). We would first find as well the
various iconography used for security and standard compliance of devices,
because they are needed in documentation and labelling of products, or the
various symbols that appear on device keyboards (including the multimedia
control keys, or other functional icons found now on many PCs for switching
the WiFi/blueTooth on/off, or control the display light, or the volume
level, or muting sound on/off, or the power sleeping mode, or muting the
mouse pad, or muting/switching displays).

Let's focus on existing symbols and iconography that have demonstrated use,
not necessarily international (the iconography may be specific to some
country or region), but at least supported by open standards, or in public
communication, i.e. excluding personal/corporate logos as long as they
remain constrained in their identity or by usage restrictiction such as
"non-commercial" or "private" or "subject to prior agreement", or prior
testing and approval, if they were not designed and advertized for use in
the general public).

The Emoji set has demonstrated its use and its urgent need for
interoperability across device vendors and service operators in Japan for
example, and elsewhere for many of them since long (such as emoticons),
because they are part of normal communication between people for their
private or public messaging (and they are not translatable by themselves, as
their identity is the symbol/icons themselves that are difficult to replace
with anything else even if they have varying shapes or colors).

There's a wide iconography used worldwide that have demonstrated use across
vendors in various application domains that merit encoding long before we
ever attempt to encode localizable sentences), and that are supported by lot
of people or organizations in their communication to convey some meaning.
Various industries are concerned and have more merit than the William's
proposal, even if these domains are not common for many people or require
special training prior to recognize and use them properly: architecture,
biotechnologies, medecine, electronic, aerospace industry, agriculture...
Once they are encoded, of course it will be possible to use them for other
domains, but the initial domain in which they were used will remain and will
keep their intended meaning. There will be other creative use of these
symbols, you'll find people making animations with them.

But what is important is not what you MAY do with these symbols, but what
has been effectively done with them.

For example I still think that layout control characters would still be more
urgent to encode for complex scripts, because the layout already has
demonstrated use and specific meaning in existing texts, even if the current
position of the UTC is to still wait and see (and before that, trying to use
upper layer protocols: as soon as we'll see that interoperability is a
problem, more controls will be added to deprecate several collections of
upper layer protocols). The character encoding model can be adapted to
support these layouts and will certainly require work in the standard.

We should not be limited by the current limitations in numeric font
technologies (OpenType, AAT/Graphite...) because even those technologies
have evolved and integrated more complex features that were initially not
possible (that's why now we have compatibility characters in the standard
for things like narrow/wide variants, when those distinctions were not
possible with older numeric font technologies). That's why we have now
things like joiners. In some future, we'll find semantic grouping characters
(even if they are invisible in rendered texts, they could be used to create
a workable layout, as they are in fact similar to existing punctuation like
parentheses). Future numeric font technologies will be able to create more
complex layouts (just like they have been extended to support complex
reordering and positioning in grapheme clusters).

The argument is much more convincing here, because there's already a
demonstrated use and need for layout distinctions, and there's still no
interoperable solution, and the texts needing those layouts already exist.
Input methods will be developed to create these layouts and normalize them
in a processable/searchable encoding, starting by various PUA experiments
and XML-like modelizations, up to the point that a standard model emerges
for the processing (this will be the case for Egyptian hieroglyphs, or
SignWriting whose encoding is for now extremely partial and clearly
insufficient to create meaningful plain-texts). The font rendering
technologies and text renderers will be adapted to support the necessary
layouts, and will accelerate the convergence to an interoperable model that
will be encodable because this newer model will also have demonstrated its
usability and better compatibility/transparency within more softwares than
the prior upper layer protocols.

The Unicode/ISO/IEC 10646 standardization is not about the encoding of all
semantics conveyed in texts. It's just about encoding what is just *enough*
to permit semantic distinctions in encoded texts, using small bricks which
are abstract characters that don't necessarily have a specific distinct
glyph. New characters are added only on the rationale that they are required
for semantic distinctions that are agreed upon by a supporting and open
community (preferably composed of several generations and not led by a
single person or organization) that has already produced many texts and
documents for which they want better interoperability across various
processing systems or softwares.

Not all glyphs will be encoded (we are in the same situation here as between
phonetic and phonology in the aural domain : we don't encode personal aural
accents with distinct orthographies, because the communication
interoperability is not in those accents) simply because they don't convey
any widely demonstrated meaningful distinctions needed for interoperability.
And the goal is not interoperability between automated systems, but between
people writing/reading these texts (most often the interoperability between
automated systems is solved by technical solutions and standards, and
enhanced technologies : technologies will continue to evolve to adapt to
what people really use or to fill the usability gaps that are still not
covered, after various experimentations).

The William's discussions are not doing any demonstration of actual use of
any experiment of solutions accepted by a community. There's no community
behind him and no need for better interoperability (but just new
difficulties added without effective benefits), they are going to nowhere.
If he wants to create something, he can do that just like others have done
with emojis : start by using PUAs, implement something, and try to convince
a community to use his solution in a productive way.

This archive was generated by hypermail 2.1.5 : Sat Jan 29 2011 - 15:33:36 CST