Actually 0xC2, according to the rules of UTF-8.
-- Doug Ewell | Thornton, CO, USA http://ewellic.org | @DougEwell -----Original Message----- From: "Ian Clifton" <ian.clifton_at_chem.ox.ac.uk> Sent: 8/28/2013 17:34 To: "Unicode discussion" <unicode_at_unicode.org> Subject: Re: What to backup after corruption of code units? On 28/08/13 23:29, Xue Fuqiao wrote: > I see. Thanks for all your replies! > > BTW I have a further question: > > On Wed, Aug 28, 2013 at 1:44 PM, Philippe Verdy<verdy_p_at_wanadoo.fr> wrote: >> - in UTF-8, you'll need to look backward between 1 to 3 positions before >> your start position to find the leading 8-bit code unit (>= 0xC0). > Why should this be >=0xC0? > Because a well‐formed UTF-8 header byte must start with at least two 1 bits, numerically, the smallest such byte is 16#C0#. -- Ian ◎Received on Wed Aug 28 2013 - 19:21:22 CDT
This archive was generated by hypermail 2.2.0 : Wed Aug 28 2013 - 19:21:22 CDT