Re: How to convert special characters into unicode?

From: Doug Ewell (dewell@adelphia.net)
Date: Wed Feb 05 2003 - 11:44:20 EST

  • Next message: Peter_Constable@sil.org: "Re: Indic Devanagari Query"

    SRIDHARAN Aravind <ASridharan at covansys dot com> wrote:

    > I have Czech special characters in an excel file.
    > I copy them into Notepad.
    > I save them.
    >
    > Now I use native2ascii convertor that is available with JDK.
    > After I run this utility, I am getting some other unicode values or
    > sometimes only whitespaces come out.
    > I don't know why?

    As Chris said, pasting them into Notepad is probably the trouble,
    because U+010C and U+010D are not part of Windows code page 1252. If
    you are running Windows 2000 or XP, Notepad can save as Unicode, but you
    must explicitly tell it to do so (the default is "ANSI"). Better to use
    a Unicode-capable editor such as WordPad, Word, or SC UniPad instead.
    (Windows code pages 1250 and 1257 do support the two Czech characters.)

    Since you already know the Unicode code points, it would have been
    easier by now to type the escape sequences (Universal Character Names)
    directly:

    \u010c
    \u010d

    Alternatively, if you use SC UniPad, there is an option to convert
    directly to UCN (as Adam mentioned), without having to bother with
    native2ascii.

    -Doug Ewell
     Fullerton, California



    This archive was generated by hypermail 2.1.5 : Wed Feb 05 2003 - 12:27:53 EST