On 10/4/2015 6:02 AM, Richard
Wordingham wrote:
In the absence of a specific tailoring, is the combination of a lone
surrogate and a combining mark a user-perceived character? Does a lone
surrogate constitute a user-perceived character?
In an editing interface, a lone surrogate
should be a user perceived character, as otherwise you won't be
able to manually delete it. Markus suggests that it be treated
like an unassigned code point.
Now, if you follow an unassigned code point with a combining mark,
what should you get?
For scripts where combining marks are productive, it seems
counter-productive (pardon the pun) to go and limit this process,
only to have to update your software every year as a new version
of Unicode comes out.
(Astute readers will notice that combining marks don't necessarily
have scripts, nor do unassigned code points, so I'm talking about
those marks that are used productively with certain scripts and
particularly those that can be applied widely ouf of context for
technical purposes)
So, if you allow a generalized algorithm that gloms these marks
onto any base, even unassigned code points, then it would be
natural to have this happen to lone surrogates as well, meaning
that the surrogate cannot be fixed in isolation. That's tough.
There are plenty of interfaces where you can't change a base
character in isolation.
If you have a bug that doesn't let you enter a sequence without
creating a lone surrogate followed by a combining mark, that's a
bug...
A./
Received on Sun Oct 04 2015 - 14:32:01 CDT