Erik van der Poel wrote:
> Frank da Cruz wrote:
> > The irony is, when using ISO 2022 character-set designation and invocation,
> > you have to handle the escape sequences first to know if you're in UTF-8.
> > Therefore, this pushes the burden onto the end-user to preconfigure their
> > emulator for UTF-8 if that is what is being used, when ideally this should
> > happen automatically and transparently.
> I may be misunderstanding the above, but ISO 2022 says:
> ESC 2/5 F shall mean that the other coding system uses
> ESC 2/5 4/0 to return;
> ESC 2/5 2/15 F shall mean that the other coding system
> does not use ESC 2/5 4/0 to return (it may have an alternative
> means to return or none at all).
> Registration number 196 is for UTF-8 without implementation level, and
> its escape sequence is ESC 2/5 4/7. I believe that ISO 2022 was designed
> that way so that a decoder that does not know UTF-8 (or any other coding
> system invoked by ESC 2/5 F) could simply "skip" the octets in that
> encoding until it gets to the octets ESC 2/5 4/0.
> This means that it does not need to decode UTF-8 just to find the escape
> sequence ESC 2/5 4/0. UTF-8 does not do anything special with characters
> below U+0080 anyway (they're just single-byte ASCII), so it works, no?
Yes, but I was thinking more about the ISO 2022 invocation features than the
designation ones: LS2, LS3, LS1R, LS2R, LS3R, SS2, and SS3 are C1 controls.
The situation *could* arise where these would be used prior to announcing
(or switching to) UTF-8. In this case, the end-user would have to configure
the software in advance to know whether the incoming byte stream is UTF-8.
Not a big deal; just an illustration of what happens when we can't use the
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT