Re: C1 controls and terminals (was: Re: Euro character in ISO)

From: Erik van der Poel (erik@netscape.com)
Date: Thu Jul 13 2000 - 13:45:58 EDT

Next message: Asmus Freytag: "Re: Miscellaneous comments/questions."
Previous message: addison@inter-locale.com: "Re: Subset of Unicode to represent Japanese Kanji?"
Maybe in reply to: Frank da Cruz: "Re: C1 controls and terminals (was: Re: Euro character in ISO)"
Next in thread: Frank da Cruz: "Re: C1 controls and terminals (was: Re: Euro character in ISO)"
Reply: Frank da Cruz: "Re: C1 controls and terminals (was: Re: Euro character in ISO)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Frank da Cruz wrote:
>
> Doug Ewell wrote:
> >
> > That last paragraph echoes what Frank said about "reversing the layers,"
> > performing the UTF-8 conversion first and then looking for escape
> > sequences. True UTF-8 support, in terminal emulators and in other
> > software as well, really should depend on UTF-8 conversion being
> > performed first.
>
> The irony is, when using ISO 2022 character-set designation and invocation,
> you have to handle the escape sequences first to know if you're in UTF-8.
> Therefore, this pushes the burden onto the end-user to preconfigure their
> emulator for UTF-8 if that is what is being used, when ideally this should
> happen automatically and transparently.

I may be misunderstanding the above, but ISO 2022 says:

ESC 2/5 F shall mean that the other coding system uses
ESC 2/5 4/0 to return;

  ESC 2/5 2/15 F shall mean that the other coding system
  does not use ESC 2/5 4/0 to return (it may have an alternative
  means to return or none at all).

Registration number 196 is for UTF-8 without implementation level, and
its escape sequence is ESC 2/5 4/7. I believe that ISO 2022 was designed
that way so that a decoder that does not know UTF-8 (or any other coding
system invoked by ESC 2/5 F) could simply "skip" the octets in that
encoding until it gets to the octets ESC 2/5 4/0.

This means that it does not need to decode UTF-8 just to find the escape
sequence ESC 2/5 4/0. UTF-8 does not do anything special with characters
below U+0080 anyway (they're just single-byte ASCII), so it works, no?

Of course, if you wanted to include any C1 controls inside the UTF-8
segment, they would have to be encoded in UTF-8, but ESC 2/5 4/0 is
entirely in the ASCII range (less than 128), so those octets are encoded
as is.

Erik

Next message: Asmus Freytag: "Re: Miscellaneous comments/questions."
Previous message: addison@inter-locale.com: "Re: Subset of Unicode to represent Japanese Kanji?"
Maybe in reply to: Frank da Cruz: "Re: C1 controls and terminals (was: Re: Euro character in ISO)"
Next in thread: Frank da Cruz: "Re: C1 controls and terminals (was: Re: Euro character in ISO)"
Reply: Frank da Cruz: "Re: C1 controls and terminals (was: Re: Euro character in ISO)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT