Re: What to backup after corruption of code units?

From: Steffen <sdaoden_at_gmail.com>
Date: Wed, 28 Aug 2013 11:35:00 +0200

Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
 [.]
 |- in UTF-8, you'll need to look backward between 1 to 3 positions before
 |your start position to find the leading 8-bit code unit (>= 0xC0).
 |
 |In both cases you have to check the value found. If you don't find it, in
 |the limited range of positions, the input is not valid UTF-8 or UTF-16 and
 |you have to handle an encoding error exception in the input stream.
 |
 |The Unicode standarddoes not specify how you'll handle this error situation
 |or from where you'll be able to resync the stream, or even if you should
 |resync from some further position; this is application-dependant. If the

«Unicode Security Considerations» [1] gives hints on how defective
byte sequences should or could be handled (in «3.6.1 Illegal Input
Byte Sequences»). This talks about conversion, but should be
applicable everywhere.

  [1] <http://www.unicode.org/reports/tr36/>

--steffen
Received on Wed Aug 28 2013 - 04:38:14 CDT

This archive was generated by hypermail 2.2.0 : Wed Aug 28 2013 - 04:38:21 CDT