Re: How can I input any Unicode character if I know its hexadecimal code?

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Nov 15 2003 - 11:45:15 EST

  • Next message: Philippe Verdy: "Re: How can I input any Unicode character if I know its hexadecimal code?"

    From: "Michael (michka) Kaplan" <michka@trigeminal.com>
    > If you install the "Chinese (Traditional) - Unicode" IME as an input
    method,
    > than any program that is prepared to accept Unicode input will handle the
    > input of this interesting IME that is expecting UTF-16 code units.
    Although
    > obviously intended for CJK, it can be used fo any UTF-16 code point.

    In Windows, there are two sets of APIs: the legacy ANSI or OEM Win32 APIs
    that support the native multibyte character set of the native platform, and
    the more recent Unicode APIs introduced in NT kernels, and partly supported
    by Windows 95/98/98SE/ME.

    I was told that Chinese applications were running with the "ANSI" code page
    similar to GBK at least, or GB18030 (of recent versions of Windows after
    2000), with the MBCS support in both cases, and that Unicode was only
    supported with UTF-16 APIs, with a built-in conversion table to transcode
    GB* with UTF-16.

    For East-Asian systems, the ANSI and OEM codepages are identical (unlike
    European systems where there's a distinction between theOEM codepage used in
    console Apps, and the ANSI codepage used in GUI apps).

    So, depending on the API on which the application is built (ANSI/OEM with
    _MBCS, or _UNICODE), the input capability of programs differ. This also
    affect areas like the filesystem naming capabilities (limited in FAT12 for
    floppies and FAT16 for Windows 3.x and NT4, extended with Unicode on FAT32
    in Windows 9x/ME and NTFS for NT4/2K/XP/2003), and Windows provides in fact
    two simultaneous input systems: the OEM charset and encoding in console
    apps, or the ANSI charset for GUI apps handling WM_CHAR events. But where is
    the input system for Unicode code points?

    Basically, there does not seems to exist such input system, but instead
    support of Unicode between IMEs and GUI components like the RTF input box.

    For output, the solution is not more simple: the components display Unicode
    UTF-16 strings with the _UNICODE APIs, and native ANSI or OEM encodings with
    the non-Unicode Win32 APIs.

    Aren't you oversimplifying the question, by considering only the most modern
    versions of Windows? I don't think that these versions have deprecated the
    ANSI charset; at least it is needed on P.R.China systems to support GB18030
    (or one of its subsets, like GBK or the legacy Microsoft codepage for
    simplified Chinese).



    This archive was generated by hypermail 2.1.5 : Sat Nov 15 2003 - 12:29:15 EST