Re: Why Work at Encoding Level? from Mark Davis ☕️ on 2015-10-20 (Unicode Mail List Archive)

From: Mark Davis ☕️ <mark_at_macchiato.com>
Date: Tue, 20 Oct 2015 18:23:17 -0700

> there is never any excuse for software to create unpaired surrogates, or
any other sort of invalid code unit sequences

First off, it depends on when one is encountered. They are invalid in
UTF16, but are permitted in a Unicode 16-bit string.

But more fundamentally, there may not be "excuses" for such software, but
it happens anyway. Pretending it doesn't, makes for unhappy customers. For
example, you don't want to be throwing an exception when one is
encountered, when that could cause an app to fail.

So the point is to handle the situation as gracefully, consistently, and as
safely as possible. And 'safely' is key. Pretending that it doesn't exist
is logically equivalent to deletion, and can cause security problems. (see
tr36)

Mark

On Mon, Oct 19, 2015 at 10:07 AM, Doug Ewell <doug_at_ewellic.org> wrote:

> This discussion was originally about how to handle unpaired surrogates,
> as if that were a normal use case.
>
> Regardless of what encoding model is used to handle characters under the
> hood, and regardless of how the Delete key should work with actual
> characters or clusters, there is never any excuse for software to create
> unpaired surrogates, or any other sort of invalid code unit sequences.
> That is like having an image editor that deletes every 128th byte from a
> JPEG file, and then worrying about how to display the file.
>
> --
> Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸
>
>
>
Received on Tue Oct 20 2015 - 20:24:43 CDT

This archive was generated by hypermail 2.2.0 : Tue Oct 20 2015 - 20:24:43 CDT