From: André Szabolcs Szelp (email@example.com)
Date: Mon Jan 05 2009 - 15:13:29 CST
2009/1/3 Asmus Freytag <firstname.lastname@example.org>:
> On 1/3/2009 2:06 AM, Ruszlan Gaszanov wrote:
>> Peter Constable wrote:
>>> I don't mean just communicated between different vendors' processes, but
>>> also interpreted and processed by different vendors' processes, in contexts
>>> >where no private agreement can be assumed. If text content is getting
>>> generated in (say) DoCoMo text protocols, spreading into other content via
>>> other protocols and then that content is getting interpreted by processes
>>> >produced by Google or Apple or whomever, than the sense in UTC (I think I
>>> can say) is going to be that that is *public* interchange, hence presenting
>>> a case for being representable in the UCS.
>> The fact that now not 3 but 5 vendors are using those PUA conventions does
>> not necessarily make it "public interchange". As I see it, the usage is
>> still restricted to the limited number of specific vendors.
> What's the magic number at which things become "public" in your take? 6
> vendors? 60 vendors? 600 vendors? 6000 vendors?
> In that context, it's worth remembering that the two emoticons (sic) that
> have been encoded in Unicode forever at WHITE/BLACK SMILING FACE exist
> because of a single vendor's character set: IBM's code page 437 (and its
> Telephone text messages are not a closed system, because telecoms typically
> provide means to connect to incoming and outgoing email at the minimum. You
> can expect these codes to leak onto the web in due course, if this is not
> happening already. Whatever the mechanism for that leakage, what Peter is
> rightly objecting to is a world where text in open interchange needlessly
> contains units that are un-interpretable.
> It doesn't matter whether one or two vendors are causing this - as long as
> their system isn't *closed*, it's not true private interchange.
As if _any_ private, PUA usage would be closed in the internet age!
Scholars defining obscure mediaeval characters in PUA for their use of
manuscript transcriptions interchange and publishing it on the web is
not a "closed system" either. Klingon, or for the matter of fact, any
invented script, even those which might be used by a handful of fans
(e.g. tech-savvy teenage boys inventing their sci-fi or fantasy cypher
script, to parallel James' japanese schoolgils example ;-) ) operating
an online forum (bulletin board) will be "leaked", potentially
indexed, &c., &c.
"Private" does not mean "secret".
"Private" in the context of PUA means, something, that is either lacks
the resources OR the validity (based on existing principles) to be
encoded as proper Unicode entities, but which a well defined set of
participants (which chan be 6000 scholars or 6 mobile operators) still
want to transmit in a plain-text-like protocol.
It's plainly obvious, I had thought before this ‒ pretty weird ‒
discussion kicked off on this list.
I was an avid ‒ and blind ‒ follower of the Unicode Spirit. Well, I'm
still convinced of Unicode 5.1, but I am disappointed of the
"flexibility" (i.e. nonexistence) of the principles of those who
govern it. For me "standards" were always about *stability*. Stability
is about being lost.
Honestly, where will you draw the limits? Will we be able to encode
every single image in unicode I can come up with if I define in a
document it's identity with a PUA codepoint and publish a website with
that PUA, (putting a description next to it in latin describing the
context of the image -- transmitted as PUA-codepoint); it is leaked
into the net, after all, and might be indexed by Google and the exact
semantics might not be determinable of its link to the image "Mr. XY
posing in front of the Eiffel Tower" in a hundred years? Anything
private character invention that is published in the internet will be
encodeable and encoded?
And now to an (IMHO) very important point:
Actually, even in the domain of emoji, how do you define character
identity? How do you know that a "Chick" is a different character
entity of "Hatching Chick", how do you know they are not mere *glyph
variants* of the character FLEDGELING?? Having had assigned different
private JIS-codes in the operators' private standard does not make
them different characters, as we've seen it with preexisting standards
of Arabic (having a codepoints for every positional variant) or the
previously cited Chinese national standard using PUA for precomposed
Tibetan glyphs! The same for "Red Heart", "Purple Heart", &c, &c. How
do you know they are not mere presentational/glyph variants of the
character HEART (already encoded: U+2661, U+2665, U+2764) assigend
different codepoints in a standard not aware of the character-glyph
model? (There have been a plenty of standards which would not make
this distinction and would encode glyphs rather than characters; how
do you know the present emoji private encoding is not such one?)
This archive was generated by hypermail 2.1.5 : Mon Jan 05 2009 - 15:17:45 CST