Re: FW: Subj: Converting from UCS-2 to UTF-8

From: Gregg Reynolds (
Date: Fri Aug 19 2005 - 09:23:00 CDT

  • Next message: Michael Everson: "Re: Orrmulum -- U+204A -- large and small ?"

    Doug Ewell wrote:
    > Gregg Reynolds <unicode at arabink dot com> wrote:
    >>Anyway the secret purpose is to spread the *nix environment to
    >>windows via cygwin. ;)
    > I hope that winky-smiley was for real, because advising a user to change
    > his operating environment -- overtly or covertly -- in order to make a
    > basic function of Unicode work will only serve to give the wrong
    > impression about Unicode.

    The wrong impression being that, uh, the user couldn't find what he
    needed at the Unicode site? ;) Anyway, who advised anybody to "change
    his operating environment"? I just recommended a toolset.

    Unicode is as complicated or as simple as it needs to be. I don't think
    recommending a toolset implies in any way that said toolset is necessary
    to "make a basic function of Unicode work". I happen to think the POSIX
    toolset (which is what cygwin is, not an "operating environment", unless
    you consider the tools in the toolbox to constitute such an environment)
    offers the "best" approach. Best tool for the job, that's all. And
    iconv isn't the only reason to look into cygwin for Unicode text
    management. There are dozens of other unicode-enabled tools that make
    life much easier, and that Windows programmers tend not to be aware of.

    Simple example from this week: take a delimited file of 17K lines of
    Arabic text in CP1256, example the text in a few columns and remove any
    personal names. With POSIX tools it is trivial to convert to utf-8, cut
    out the columns, break into words, sort, and remove duplicates; examine
    the results by eye to pull out the personal names into a separate file;
    then the next time use the names file to match against the data and pull
    out names. No programming, just stringing together a few commands.


    This archive was generated by hypermail 2.1.5 : Fri Aug 19 2005 - 09:24:24 CDT