Re: What to backup after corruption of code units?

From: Asmus Freytag <asmusf_at_ix.netcom.com>
Date: Wed, 28 Aug 2013 17:17:36 -0700

On 8/28/2013 3:29 PM, Xue Fuqiao wrote:
> I see. Thanks for all your replies!
>
> BTW I have a further question:
>
> On Wed, Aug 28, 2013 at 1:44 PM, Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
>> - in UTF-8, you'll need to look backward between 1 to 3 positions before
>> your start position to find the leading 8-bit code unit (>= 0xC0).
> Why should this be >=0xC0?
>
because all trailing bytes start with pattern 10xxxxxx which is <
1100000 for any value of x.
(The bits marked x can take any bit combination, while the first two
bits are constant).

So, if you see byte >= 0xC0 you know that you are on a leading byte.

(single bytes, those < 0x80 don't need any backup, if your pointer
points to one of them,
you are at a character boundary anyway).

A./
Received on Wed Aug 28 2013 - 19:19:05 CDT

This archive was generated by hypermail 2.2.0 : Wed Aug 28 2013 - 19:19:05 CDT