RE: Emoji: emoticons vs. literacy

From: Peter Constable (petercon@microsoft.com)
Date: Fri Jan 02 2009 - 00:29:16 CST


From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of James Kass

"Based on the responses (or mostly lack of responses)
from key people on the mailing list I think that this
proposal is a done deal...

By no means is it a done deal. Thus far, UTC has only discussed the *possibility* of accepting a proposal to encode a collection of symbols used in Japanese telecom protocols and openness to that possibility. But no concrete proposal has actually been presented thus far, so UTC has not decided anything final in that regard at all.

"Which tells me that
the real mover behind this is probably Google, since
they are the ones sucking up content from everywhere,
and need to be able to store it in Unicode."

Google certainly has been a main proponent, though they are not the only ones. The reality is that, even though a vendor may use private-use Shift-JIS encodings in their protocols, those things can leak into other contexts that other vendors have to start interoperating with. That's exactly the reason why Microsoft was a main proponent for encoding a different set of symbols used in other JPN protocols, the ARIB symbols that are in amendment 6 of ISO 10646.

> As we've discussed here previously, the telephone companies
> have apparently already resolved *their* interoperability
> issues by mapping from their own user defined mutually
> incompatible Shift-JIS encodings into Unicode's PUA consistently.

There's an oxymoronic problem here that isn't sustainable in the long run: public interchange using private-use encodings. Either public interchange is not assumed to be possible, or the private-use area is no longer really private. If public interchange *is* happening in text protocols, then the de facto reality is that there are (abstract) characters* involved that are potential candidates for encoding in the Universal Character Set.

*Keep in mind the definition of abstract character: "A unit of information used for the organization, control, or representation of textual data." What is not explicit is the meaning of "textual data", but I think there's a fair chance that most UTC members would accept ostensively that the data in a text protocol is certainly an instance of "textual data".

Peter



This archive was generated by hypermail 2.1.5 : Fri Jan 02 2009 - 15:29:59 CST