Re: ASCII control codes in sequences of multibyte character sets

From: Steffen <>
Date: Sat, 31 Aug 2013 16:36:35 +0200

Thank you all very much for your kind answers!
My goodness, i should have referenced the thread on the POSIX
mailing list myself, yet i guess it discerns the expert that he
knows about evil character sets without such hints…

Reading your messages it seems safe to request a clarification of
a POSIX wording (Base Definitions, 6.2 Character Encoding; [1]),

  Likewise, the byte values used to encode <period> and <slash>
  shall not occur as part of any other character in any locale.


  Likewise, the byte values used to encode <period>, <slash>,
  <newline> and <carriage-return> shall not occur as part of any
  other character in any locale.

  [1] <>

Of course the ISO C and POSIX facilities are insufficient to deal
with text, portably. (But this theoretical change would turn many
decade-old POSIX programs which test characters against '\n' and
'\r' into functioning software again. By definition, that is.)

P.S.: Wow! I now have an email account nearby the wild Rocky
Mountains! I reckon that's a good place for living. Yay!


attached mail follows:

Hello character plus experts,
i'm wondering wether there are any multibyte character sets known
which use the numerical values of ASCII control characters that
are vital to Unix/POSIX (plus) as part of multibyte sequences?
In particular U+000A and U+000D?
Thank you very much in advance (and don't forget to have a nice
weekend, will ya?)

Received on Sat Aug 31 2013 - 09:39:56 CDT

This archive was generated by hypermail 2.2.0 : Sat Aug 31 2013 - 09:39:59 CDT