Re: Encoding/Use of pontial unpaired UTF-16 surrogate pair specifiers from Chris Jacobs on 2016-01-31 (Unicode Mail List Archive)

From: Chris Jacobs <chris.jacobs_at_xs4all.nl>
Date: Sun, 31 Jan 2016 19:07:57 +0100

J Decker schreef op 2016-01-31 18:56:
> On Sun, Jan 31, 2016 at 8:31 AM, Chris Jacobs <chris.jacobs_at_xs4all.nl>
> wrote:
>>
>>
>> J Decker schreef op 2016-01-31 03:28:
>>>
>>> I've reconsidered and think for ease of implementation to just mask
>>> every UTF-16 character (not codepoint) with a 10 bit value, This
>>> will
>>> result in no character changing from BMP space to surrogate-pair or
>>> vice-versa.
>>>
>>> Thanks for the feedback.
>>
>>
>> So you are still trying to handle the unarmed output as plaintext.
>> Do you realize that if a string in the output is replaced by a
>> canonical
>> equivalent
>> one this may mess up things because the originals are not canonical
>> equivalent?
>>
> I see ... things like mentioned here
> http://websec.github.io/unicode-security-guide/character-transformations/

Yes especially the part about normalization.
This would not only spoil the normalized string, but also, as the string
can have a different length,
for anything after that your ever-changing xor-values may go out of
sync.
Received on Sun Jan 31 2016 - 12:08:59 CST

This archive was generated by hypermail 2.2.0 : Sun Jan 31 2016 - 12:08:59 CST