RE: Japanese Date Processing in Shift_JIS or UTF-8 encoding using

From: Marco.Cimarosti@icl.com
Date: Mon Jun 12 2000 - 14:20:03 EDT


Kedar Moghe wrote:
> I want to parse & display Date on HTML pages. I can use JavaScript, JSP,
> or Java. The problem is to convert the UTC date format into Japanese
> format. The delimiters of Japanese date differ from standard Date
> representation in English.

I guess there are many ready-made Java classes that enable your program to
converting from Unicode to/from other standards.
I hope someone else on the list has more help about this.

As an alternative, why not directly using shift-JIS in your JavaScript?

> My HTML/JSP pages are enforced to use UTF-8 encoding. So, when my
> ServerApplication dumps the Date related information, it is always in UTC
> date format. To display the same in Japanese format, I need to embed the
> Japanese delimiters into the date, re-calculate & re-format it, then
> display it. Since, the delimiters are in Shift_JIS, so I need to convert
> it into UTF-8 before displaying. That's the problem while displaying dates
> in Japanes format.

The shift-JIS strings you need are:

shift-JIS (Unicode, Meaning, Kanji)
"G/" (U+5E74, year, 年)
"7n" (U+6708, month, 月)
"F|" (U+65E5, day, 日)
";~" (U+6642, hour, 時)
"J," (U+5206, minute, 分)

Each 2-byte kanji should be preceded by "\x1B"+"(B" and followed by
"\x1B"+"$@", in order to properly reset s-JIS states.

> Similarly, when the Date is keyed in by the user in a textbox of an HTML
> page, then I need to parse it using JavaScript. Even though the HTML page
> is enforced UTF-8 encoding, while the user enters the date in HTML
> textbox, the data remains in Shift_JIS format (system locale). It is
> converted to UTF-8 only after FORM submssion. But, my problem is to parse
> the date in JavaScript, which means that the date is still in Shift_JIS
> format. So, how do I do that?

Also in this case, you can try reading JIS directly. The case of parsing,
however, is much more complicated because of these unfortunate facts:

1 -- You cannot be sure that your separators are properly preceded and
followed by the escape sequences above. These are commands to shift the JIS
state, so they are only used when necessary. E.g., if the text preceding
your date is already in the correct state (another kanji precedes it), then
it will not have the "\x1B"+"(B". (The only proper way to handle this is to
implement a shift-JIS decoder. See a classic reference:
http://jfly.nibb.ac.jp/html/manuals/internet/japan.inf.txt).

2 -- As usual, human input is less foreseeable that machine-generated text.
So, your user could type a variant kanji for one of the separators (e.g. I
know that at least "hour" has more than one kanji), or just use a "/", etc.

To ease things, you may choose to use a multi-field form, rather than a free
text entry box, as in this fragment:

        <html>
        <head>
                <meta http-equiv="Content-Type" content="text/html;
charset=ISO-2022-JP">
                <title>Sample Japanese date/time input</title>
        </head>
        <body>
                <form name=Jdate method=GET action="YourScript">
                        <input name=Year type=text size=4 maxlength=4>
$@G/(B
                        <input name=Month type=text size=2 maxlength=2>
$@7n(B
                        <input name=Day type=text size=2 maxlength=2>
$@F|(B
                        <input name=Hour type=text size=2 maxlength=2>
$@;~(B
                        <input name=Minute type=text size=2 maxlength=2>
$@J,(B
                        <input type=submit value="$@G/7nF|;~J,(B">
                </form>
        </body>
        </html>

It may be less elegant, but it only causes you to parse numbers...

Hope this helps.

_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:03 EDT