Re: Is the binaryness/textness of a data format a property? from Eli Zaretskii via Unicode on 2020-03-21 (Unicode Mail List Archive)

From: Eli Zaretskii via Unicode <unicode_at_unicode.org>
Date: Sat, 21 Mar 2020 22:26:24 +0200

> From: "Doug Ewell" <doug_at_ewellic.org>
> Cc: <unicode_at_unicode.org>
> Date: Sat, 21 Mar 2020 13:33:18 -0600
>
> > Emacs uses some of that for supporting charsets that cannot be mapped
> > into Unicode. GB18030 is one example of such charsets. The internal
> > representation of characters in Emacs is UTF-8, so it uses 5-byte
> > UTF-8 like sequences to represent such characters.
>
> When 137,468 private-use characters aren't enough?

Why is that relevant to the issue at hand?

> I thought the whole premise of GB18030 was that it was Unicode mapped into a GB2312 framework. What characters exist in GB18030 that don't exist in Unicode, and have they been proposed for Unicode yet

I don't remember off hand, but last time I looked at GB18030, there
were a lot of them not in Unicode.

> and why was none of the PUA space considered appropriate for that in the meantime?

Because many fonts already use them? I don't really know why it was
decided to use codepoints above 0x1FFFFF, it's just that this is how
Emacs works for quite some time. You asked for examples of usage, and
I provided one.
Received on Sat Mar 21 2020 - 15:26:47 CDT

This archive was generated by hypermail 2.2.0 : Sat Mar 21 2020 - 15:26:47 CDT