email@example.com (Timothy Partridge) wrote:
> Do IBM DBCS strings assume starting in single byte mode?
> And would the presence of certain bytes in UTF-16 trigger a switch from
> double to single byte mode?
Yes and yes. There are a number of Asian EBCDIC codepages that follow this
structure. These are essentially two codepages in one, the selection of the
current one being achieved by means of SI and SO. For example, CP 930 is
the Japanese codepage containing katakana, and this is composed of the SBCS
CP 290 and DBCS CP 300; similarly, the English-Japanese codepage is 939,
containing CP 37 as the SBCS part, but still using CP 300 for the DBCS
The data are assumed to be SBCS at the start, so if one starts with 939,
it's assumed to be CP 37 until SO is encountered, when it assumes that data
are CP 300; this is switched back to SBCS mode by means of the SI
character. I can't remember the exact rules, but I'm pretty certain that
the string is supposed to return to the SBCS position at the end, something
like ISO 2022-JP.
Conversely, if you're converting Unicode data containing English, it'll go
to the SBCS CP 37, but any Kanji should trigger a switch, putting SO in the
output and switching to CP 300.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT