Re: Deleting Lone Surrogates from Asmus Freytag (t) on 2015-10-04 (Unicode Mail List Archive)

From: Asmus Freytag (t) <asmus-inc_at_ix.netcom.com>
Date: Sun, 4 Oct 2015 12:30:23 -0700

On 10/4/2015 6:02 AM, Richard Wordingham wrote:

In the absence of a specific tailoring, is the combination of a lone
surrogate and a combining mark a user-perceived character?  Does a lone
surrogate constitute a user-perceived character?

In an editing interface, a lone surrogate should be a user perceived character, as otherwise you won't be able to manually delete it. Markus suggests that it be treated like an unassigned code point.

Now, if you follow an unassigned code point with a combining mark, what should you get?

For scripts where combining marks are productive, it seems counter-productive (pardon the pun) to go and limit this process, only to have to update your software every year as a new version of Unicode comes out.

(Astute readers will notice that combining marks don't necessarily have scripts, nor do unassigned code points, so I'm talking about those marks that are used productively with certain scripts and particularly those that can be applied widely ouf of context for technical purposes)

So, if you allow a generalized algorithm that gloms these marks onto any base, even unassigned code points, then it would be natural to have this happen to lone surrogates as well, meaning that the surrogate cannot be fixed in isolation. That's tough. There are plenty of interfaces where you can't change a base character in isolation.

If you have a bug that doesn't let you enter a sequence without creating a lone surrogate followed by a combining mark, that's a bug...

A./
Received on Sun Oct 04 2015 - 14:32:01 CDT

This archive was generated by hypermail 2.2.0 : Sun Oct 04 2015 - 14:32:01 CDT