Re: Encoding/Use of pontial unpaired UTF-16 surrogate pair specifiers from Doug Ewell on 2016-01-30 (Unicode Mail List Archive)

From: Doug Ewell <doug_at_ewellic.org>
Date: Sat, 30 Jan 2016 14:46:39 -0700

Chris Jacobs wrote:

>>> UTF16 has no way to define a code point that is D800-DFFF; this is
>>> an issue if I want to apply some sort of encryption algorithm and
>>> still have the result treated as text for transmission and encoding
>>> to other string systems.
>
> This is not an issue at all. You don't have to restrict the input to
> text to be able to generate an output that can be treated as text.

I gathered that J wanted to generate arbitrary output that could be
interpreted as UTF-16 code units. I admit to being less than 100% sure
of this.

Certainly there is no shortage of algorithms to map arbitrary byte input
to text output, usually limited to some subset of ASCII. One interesting
approach for the Unicode era was Markus Scherer's "Base16k" concept, at
https://sites.google.com/site/markusicu/unicode/base16k .

--
Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸

Received on Sat Jan 30 2016 - 15:47:39 CST

This archive was generated by hypermail 2.2.0 : Sat Jan 30 2016 - 15:47:39 CST