Re: Unihan.txt and the four dictionary sorting algorithm

From: Edward H. Trager (
Date: Fri Apr 23 2004 - 13:49:34 EDT

  • Next message: Michael Everson: "Re: [OT] Even viruses are now i18n!"

    On Friday 2004.04.23 09:11:30 -0700, Benjamin Peterson wrote:
    > On Fri, 23 Apr 2004 12:12:57 -0400, "Edward H. Trager"
    > <> said:
    > > There is an issue that you might confront with these terminal-based tools
    > > on
    > > Windows and on Mac OSX that I myself don't know how to solve, and that is
    > > that
    > > I don't know how to switch to a UTF-8 locale on either Windows or Mac
    > > OS-X so
    > > that terminal programs such as Xterm or the Cygwin terminal would display
    > > the UTF-8
    > > characters beyond ASCII correctly. My own solution to this problem was
    > > trivially
    > > easy: don't use Windows or Mac OS X for multilingual database work; use
    > > Linux
    > > instead.
    > Wow -- I'd hate to see your idea of a non-trivial solution!

    Well, yes, perhaps that sounds funny. But I work in a lab where we have all
    three OSes - Windows 2K, Mac OS X, and SuSE Linux. As a developer and sysadmin,
    I happen to have the luxury that I can pretty much use whatever I want and that tends to be Linux
    which over time I have found works the best, for me at least, for dealing with
    UTF-8 data (or any other format of Unicode data, since its easy to convert to UTF-8
    on Linux). People tend to use what they know best, and I know the *nix stuff better
    than anything else. Since Mac OS X has BSD under the hood, I find I can also get
    a lot done without much pain on OS X even though I've spent very little time using the OS.
    Windows, on the other hand, is just plain annoying
    (Windows' lack of a decent shell and command-line tools is probably
    what makes the OS most annoying).

    > > Perhaps someone else on this list can tell us how to get Apple's terminal
    > > application
    > > or xterm running on OS X to display UTF-8 characters correctly (probably
    > > just needs
    > > the correct UTF-8 based locale setting. There also must be some
    > > solutions to this
    > > problem on Windows terminals too, I just don't know what they are.
    > Theoretically, doing 'chcp 65001' in cmd.exe should make it work to the
    > extent that 'cat' will then work correctly on a utf-8 file. This works
    > for me but some people report issues. The only other major Windows
    > shell, 4nt, does not work for me with utf-8 at all. Since cmd.exe is a
    > horrible shell, I would suggest:
    > 1 -- doing everything from vim (preferred, of course :))
    > 2 -- doing everything from regular windows gui tools, which have been
    > unicode-freindly since forever.
    > chcp 65001 may work for you, though.

    I've tried that, but doesn't seem to change much.

    > Benjamin
    > --
    > Benjamin Peterson

    This archive was generated by hypermail 2.1.5 : Fri Apr 23 2004 - 13:44:44 EDT