Re: Dealing with Unencodeable Characters

From: Ken Whistler <kenwhistler_at_att.net>
Date: Thu, 6 Oct 2016 11:30:52 -0700

On 10/6/2016 7:54 AM, Charlotte Buff wrote:
> If theoretically I wanted to convert an old Shift JIS document
> containing emoji to Unicode, how should I ideally handle Shibuya 109?

And the general answer to that is convert to U+FFFD, unless you are
doing something specific and know what you are doing. ... in which case
you can use PUA or insert an image, or whatever else you need to do.

This is not a character *standardization* issue that requires the UTC to
come up with a generic interchange solution for every pre-Unicode
character encoding of everything that ever was, whether it be some
oddball Shift JIS extensions that were omitted in the consensus on
encoding the Japanese Carrier Emoji:

http://www.unicode.org/reports/tr51/tr51-7.html#Japanese_Carrier

or other odds and ends from bizarre, dead-end, disused character
encodings from a previous generation.

By the way, the biggest ongoing problem we deal with here is the
continuing urge to proliferate font-encoded hacks for particular
languages and writing systems. The text interchange problems that such
schemes pose on an ongoing basis far far outweigh issues like what to do
with a Shibuya 109 emoji, imo.

--Ken
Received on Thu Oct 06 2016 - 13:31:21 CDT

This archive was generated by hypermail 2.2.0 : Thu Oct 06 2016 - 13:31:21 CDT