RE: converting Unicode text into Unicode codes

From: Richard Kunst (rkunst@humancomp.org)
Date: Fri Oct 26 2001 - 11:31:45 EDT


If, as Doug suggests, Vadim wants to do something like represent "'Hi' as
'U+0048 U+0069'", then the current version 1.5 of our UniEdit text editor
for Windows has a handy "Copy as" feature which automates this conversion
and a number of others.

It permits copying any amount of selected Unicode text from a UniEdit edit
window to the Windows clipboard in a variety of special formats, in addition
to the usual Unicode-text and local code page Windows clipboard formats,
which result from the usual "Copy" feature. Here are the various formats:

Unicode UTF-8 Encoding
U+nnnn (Unicode Character Literals)
&#number; (HTML Numeric Character References)
\\unnnn (Java Unicode Escape Sequences)
0xnnnn, (C/C++ Hexadecimal Integer Constants, Comma-delimited)
&Hnnnn (Visual Basic Hexadecimal Integer Constants)

The resulting formatted strings can then be pasted directly into source
code, a resource string file, documentation, etc. in another text editor or
a non-Unicode-aware application (in the case of UTF-8 format).

I'm not sure if we handle properly the formatting of surrogates or anything
beyond the BMP....

UniEdit v.1.5 can be downloaded from here:

http://research.humancomp.org/ftp/pub/download/unied32.exe (9825KB)

General UniEdit information is available here:

http://www.humancomp.org/uniintro.htm
(although there isn't much detail there about the new features added in
v.1.5).

Best wishes,
Rick Kunst

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
The Humanities Computing Laboratory
A Nonprofit Education and Research Corporation
301 W. Main St., Suite 400-I
Durham, NC 27701 USA
Tel. (919) 667-9556, (919) 656-5915
Fax: (919) 667-9556
E-mail: rkunst@humancomp.org
http://www.humancomp.org
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/

> -----Original Message-----
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
> Behalf Of DougEwell2@cs.com
> Sent: Thursday, October 25, 2001 1:41 AM
> To: unicode@unicode.org
> Cc: vkhaske@tcsi.com
> Subject: Re: converting Unicode text into Unicode codes
>
>
> Nobody seems to have touched this one yet...
>
> On 2001-10-22 at 15:35, Vadim Khaskel <vkhaske@tcsi.com> wrote:
>
> > I have question regarding tools available to convert Unicode
> > text into Unicode codes. We work on enhancement of our current product
> > and one of the new features is "Internationalization". Please let me
> > know if you may heard of such a tool.
>
> As Addison Phillips says in his signature block,
> "Internationalization is an
> architecture. It is not a feature."
>
> You should clarify what you mean by "convert Unicode text into Unicode
> codes." All computerized text, in Unicode or any other character set, is
> represented as a sequence of codes. If the text is already
> "Unicode text,"
> then by definition it is already encoded in "Unicode codes."
>
> If you have text in another encoding, such as Latin-1 or Windows
> CP1252 or
> EBCDIC or whatever, and wish to convert it to Unicode, there is a
> handy tool
> called "recode" available as free software on the Internet.
>
> If you already have Unicode text and wish to view the Unicode
> scalar values
> of the text (e.g. you want to display "Hi" as "U+0048 U+0069"), somebody
> could probably whip up a quick Perl script to do this.
>
> But I think you need to explain more clearly what it is you have
> and what you
> want.
>
> -Doug Ewell
> Fullerton, California



This archive was generated by hypermail 2.1.2 : Fri Oct 26 2001 - 13:06:01 EDT