Re: DBCS and Unicode 3.1

From: Jungshik Shin (jshin@mailaps.org)
Date: Wed Feb 19 2003 - 23:29:21 EST

  • Next message: Andrew C. West: "Re: CJK Unified Ideographs Range"

    On Tue, 18 Feb 2003, Markus Scherer wrote:

    > Jungshik Shin wrote:
    > > On Mon, 17 Feb 2003, Markus Scherer wrote:
    > >>Other examples: There are EUC-JP (1/2/3 bytes per character) and
    > >>EUC-CN (1/2/4 BpC) which are quite "old" (much older than GB 18030).
    > >
    > > Markus's fingers made a mistake here :-). It's EUC-TW (not EUC-CN)
    > > that encodes CNS 11643 plane 2(1) thru plane 7 using SS2.

    > MBCS. By the way, the encoding scheme for EUC-TW has space for 16 CNS
    > planes, and some vendor implementations use higher planes than 7.

      Yup. BTW, EUC-KR also uses more than 2 bytes. 8(eight) byte sequences
    can be used to represent 8,822 precomposed modern Korean syllables
    not representable with 2 bytes in EUC-KR(ref.
    KS X 1001:1998/KS C 5601-1987 annex 2). So, the full set
    of 11,172 precomposed syllables in Unicode can be round-tripped
    between Unicode and EUC-KR. This is used by the most popular
    web mail service in Korea(well, they should switch to UTF-8
    instead of lengthening the life of EUC-KR this way) and implemented
    in Mozilla/Netscape and a variant of xterm for Korean(hanterm).

      Jungshik



    This archive was generated by hypermail 2.1.5 : Thu Feb 20 2003 - 00:15:10 EST