> On 16 May 2017, at 15:21, Richard Wordingham via Unicode <unicode_at_unicode.org> wrote:
> 
> On Tue, 16 May 2017 14:44:44 +0200
> Hans Åberg via Unicode <unicode_at_unicode.org> wrote:
> 
>>> On 15 May 2017, at 12:21, Henri Sivonen via Unicode
>>> <unicode_at_unicode.org> wrote:  
>> ...
>>> I think Unicode should not adopt the proposed change.  
>> 
>> It would be useful, for use with filesystems, to have Unicode
>> codepoint markers that indicate how UTF-8, including non-valid
>> sequences, is translated into UTF-32 in a way that the original octet
>> sequence can be restored.
> 
> Escape sequences for the inappropriate bytes is the natural technique.
> Your problem is smoothly transitioning so that the escape character is
> always escaped when it means itself. Strictly, it can't be done.
> 
> Of course, some sequences of escaped characters should be prohibited.
> Checking could be fiddly.
One could write the bytes using \xnn escape codes, sequences terminated using \& as in Haskell, translating '\' into "\\". It then becomes a C-encoded string, not plain text.
Received on Thu May 18 2017 - 03:30:59 CDT
This archive was generated by hypermail 2.2.0 : Thu May 18 2017 - 03:30:59 CDT