Re: Tag characters and in-line graphics (from Tag characters) from Asmus Freytag (t) on 2015-06-05 (Unicode Mail List Archive)

From: Asmus Freytag (t) <asmus-inc_at_ix.netcom.com>
Date: Fri, 05 Jun 2015 03:46:10 -0700

On 6/4/2015 17:03 , "Chris" wrote:
> This whole discussion is about the fact that it would be technically
> possible to have private character sets and private agreements that
> your OS downloads without the user being aware of it.

The sticky issues are not the questions of how to make available fonts
or images for use by the OS.

Instead, they concern the fact that any such a model violates some
pretty basic guarantees of plain text that the entire net infrastructure
relies on.

There are very obvious security issues. The start with tracking; every
time you access a custom code point, that fact potentially results in a
trackable interaction. This problem affects even the "sticker" solution
that people are hoping for for emoji. (On my system, no external
resources are displayed when I first open any message, and there is a
reason for that).

Beyond tracking, and beyond stickers (that is pictures that look like
pictures) a generalized custom character set would allow "text" that is
no longer really stable. You would be able to deliver identical e-mails
to people that display differently, because when you serve the custom
fonts, you would be able to customize what you deliver under the same
custom character set designator.

While this would be a wonderful way to circumvent censorship (other than
the "man in the middle" version), you would likewise seriously undermine
the ability to filter unwanted or undesirable texts, because the custom
character set engine might recognize when a request comes from a filter
and not the end user. (Just the other day, I came across a hacked
website that responded differently to search engined than to live users,
making the hack effective for one and invisible to the other. Custom
character sets would seem to just add to the hackers' arsenal here).

Finally, custom character sets sound like a great idea when thinking of
an extension of an existing character set. But that's not where the
issues are. The issues come in when you use the same technology to
provide aliases for existing code points or for other custom characters.

Aliasing undermines the ability to do search (or any other
content-focused processing, from sorting to spell-check).

At that point, the circle closes.

When Unicode was created, the alternative then was ISO 2022, which was a
standard that addressed the issue of how to switch among (albeit
pre-defined) character sets to achieve, in principle, coverage equal to
the union of these character sets.

Unicode was created to address two main deficiencies of that situation.
Unification addressed the aliasing issue, so that code points were no
longer "opaque" but could be interpreted by software (other than
display), which was the second big drawback of the patchwork of
character sets. A processing model for opaque code points is possible to
define, but it isn't very practical and in the late eighties people had
had enough were glad to be quit of it.

Seen from this perspective, the discussion about custom character sets
presents itself as a giant step backward, undermining the very advances
that underlie the rapid acceptance and spread of Unicode.

A./
Received on Fri Jun 05 2015 - 05:47:12 CDT

This archive was generated by hypermail 2.2.0 : Fri Jun 05 2015 - 05:47:13 CDT