RE: cp 932 to UTF-8 conversion (for Java)

Date: Tue Nov 16 1999 - 07:03:19 EST

Your code is very compact and smart.

I only have a minor question: why do you start your "hankaku block" at
U+FF60? That code point is not assigned: Unicode hankaku characters range
from U+FF61 to U+FF9F.

One reason I can imagine is that this definition for IS_HANKAKU(u):
        (0xff60==((u) & 0xffe0)) || (0xff80==((u)&0xffe0))

is assumed to be more efficient than this one:
        ((u)>=0xff61 && (u)<=0xff9f)

But, if this is the assumption, I don't see the ground for it. "&" is very
light-weight, but two "=="s and one "||" should be exactly as heavy as one
">=", one "<=" and one "&&".

Another possible explanation is that Ken Lunde's original algorithm
converted the JIS "hankaku space" (that, if not unified with U+0020, would
be at U+FF60) to the JIS equivalent of U+3000 ("zenkaku" IDEOGRAPHIC SPACE),
a conversion which is useless and impossible in Unicode.


> -----Original Message-----
> From: []
> Sent: 1999 November 16, Tuesday 00.41
> To: Unicode List
> Cc: Unicode List
> Subject: Re: cp 932 to UTF-8 conversion (for Java)
> I have code a HankakuToZenkaku (Unicode to Unicode) text transformation in
> mozilla source code from Ken Lunde's non Unicode base algorithm in
> Understanding
> CJKV Information Porcessing. I am not sure it is bug free. It should be
> easy to
> conert to Java....
> see
> nkaku.cpp
> "Peck, Jon" wrote:
> > We need to convert Japanese Windows (cp932) encoded Java resource
> bundles
> > into UTF-8. The Java nativetoascii converter seems not to allow us to
> > preserve the half-width katakana characters, mapping them to their
> > full-width forms instead of using the characters in the surrogate area.
> > Since the half-width form is what our folks want, we need to do this for
> the
> > user interface materials of a Java app.
> >
> > Surprisingly, I haven't been able quickly to locate a (preferably batch)
> > converter that will do this, but surely there must be many. Can anyone
> > point me to a tool with this capability? I'd certainly prefer not to
> write
> > one.
> >
> > Thanks in advance. << File: Card for Frank Yung-Fong Tang >>

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT