Re: Unihan.txt and the four dictionary sorting algorithm

From: Benjamin Peterson (
Date: Fri Apr 23 2004 - 12:11:30 EDT

  • Next message: "Variation selectors and vowel marks"

    On Fri, 23 Apr 2004 12:12:57 -0400, "Edward H. Trager"
    <> said:

    > There is an issue that you might confront with these terminal-based tools
    > on
    > Windows and on Mac OSX that I myself don't know how to solve, and that is
    > that
    > I don't know how to switch to a UTF-8 locale on either Windows or Mac
    > OS-X so
    > that terminal programs such as Xterm or the Cygwin terminal would display
    > the UTF-8
    > characters beyond ASCII correctly. My own solution to this problem was
    > trivially
    > easy: don't use Windows or Mac OS X for multilingual database work; use
    > Linux
    > instead.

    Wow -- I'd hate to see your idea of a non-trivial solution!

    > Perhaps someone else on this list can tell us how to get Apple's terminal
    > application
    > or xterm running on OS X to display UTF-8 characters correctly (probably
    > just needs
    > the correct UTF-8 based locale setting. There also must be some
    > solutions to this
    > problem on Windows terminals too, I just don't know what they are.

    Theoretically, doing 'chcp 65001' in cmd.exe should make it work to the
    extent that 'cat' will then work correctly on a utf-8 file. This works
    for me but some people report issues. The only other major Windows
    shell, 4nt, does not work for me with utf-8 at all. Since cmd.exe is a
    horrible shell, I would suggest:

    1 -- doing everything from vim (preferred, of course :))
    2 -- doing everything from regular windows gui tools, which have been
    unicode-freindly since forever.

    chcp 65001 may work for you, though.


      Benjamin Peterson

    This archive was generated by hypermail 2.1.5 : Fri Apr 23 2004 - 12:48:02 EDT