RE: ASCII control codes in sequences of multibyte character sets

From: Dreiheller, Albrecht <>
Date: Thu, 5 Sep 2013 10:16:54 +0000

From: Steffen Daode Nurpmeso, Saturday, August 31, 2013 4:37 PM

> Likewise, the byte values used to encode <period>, <slash>,
> <newline> and <carriage-return> shall not occur as part of any
> other character in any locale.

In this context, it might be useful to know that there are some codepoints
in some Chinese multi-byte encodings, which contain a byte looking like
a Backslash "\" 0x5C as trail byte.
This can cause problems in C-like string literals where \ acts as a meta-character.


in BIG5 (Win CP 950) Traditional Chinese
U+03B1 maps to A3 5C
U+4E48 maps to A4 5C
U+4FDF maps to AB 5C

in GBK (Win CP 936) Simplified Chinese
U+2010 maps to A9 5C
U+2558 maps to A8 5C
U+4E57 maps to 81 5C
