RE: What to backup after corruption of code units?

From: Doug Ewell <doug_at_ewellic.org>
Date: Wed, 28 Aug 2013 18:19:36 -0600

Actually 0xC2, according to the rules of UTF-8.

--
Doug Ewell | Thornton, CO, USA
http://ewellic.org | @DougEwell
-----Original Message-----
From: "Ian Clifton" <ian.clifton_at_chem.ox.ac.uk>
Sent: ‎8/‎28/‎2013 17:34
To: "Unicode discussion" <unicode_at_unicode.org>
Subject: Re: What to backup after corruption of code units?
On 28/08/13 23:29, Xue Fuqiao wrote:
> I see.  Thanks for all your replies!
>
> BTW I have a further question:
>
> On Wed, Aug 28, 2013 at 1:44 PM, Philippe Verdy<verdy_p_at_wanadoo.fr>  wrote:
>> - in UTF-8, you'll need to look backward between 1 to 3 positions before
>> your start position to find the leading 8-bit code unit (>= 0xC0).
> Why should this be >=0xC0?
>
Because a well‐formed UTF-8 header byte must start with at least two 1 
bits, numerically, the smallest such byte is 16#C0#.
-- 
Ian ◎
Received on Wed Aug 28 2013 - 19:21:22 CDT

This archive was generated by hypermail 2.2.0 : Wed Aug 28 2013 - 19:21:22 CDT