From: Asmus Freytag [email@example.com]
Sent: Thursday, May 03, 2001 12:22 AM
Subject: Mapping Table issues (see L2/001-192)
T. Kubota recently submitted a problem report on East Asian Mappings which
now is document L2/001-192. I have had a private communication with him
surrounding his submission and he made a number of additional comments
which I would like to pass on. I've removed details already covered
elsewhere (L2/001-179 and L2/001-189) but left in some of my replies to him.
Please treat this simply as additional background for our discussion at the
>At 11:54 AM 5/2/01 +0900, Tomohiro KUBOTA wrote:
>>I thought that "fullwidth and halfwidth forms" should not be
>>used unless normal version is already used for other codepoints.
>I think this was our starting point, but then, this caused some
>problems with some vendor sets that have both narrow forms AND
>the wide forms for POUND, CENT, NOT SIGN, etc. With mapping to
>Fullwidth forms, all Japanese sets, whether 'pure' JIS or, vendor supersets
>of JIS can map the same character to the same Unicode character.
>We probably need to explain this more.
>>Anyway, I hope that Unicode Consortium takes a solution which
>>does not bring large confusion. (I am afraid that changing
>>conversion table might confuse users.) However, if Unicode
>>Consortium can take an initiative and major vendors (like
>>Microsoft, Apple, and Sun) will follow it, it will be OK.
>Some vendors whose mappings I was able to check already agree with this.
>>In short, any way will be OK. I think it is important that
>>Unicode Consortium takes an initiative and avoid confusion.
>>I guess there are some Japanese people who know needs of
>>average Japanese Windows/Macintosh/Linux/... users in Unicode
>>Consortium. I hope this problem will be discussed with them.
On adding X0212 to the list of encodings on which EAW is based:
>>Though it is true that JIS X 0212 is not very popular,
>>I don't think there are any positive reason not to support
>>JIS X 0212. Mule and Emacs are samples of implementation.
>Adding X0212 into the EAW pool of legacy encodings adds a large
>number of characters to class "A" and makes it harder to get
>context information to decide whether to treat a character as
>wide or narrow. In particular, it's not so much a question of
>whether *some part* of X0212 is supported, but whether these
>European characters are used as wide characters by a large
>enough group of users to reflect it in the EAW tables.
>> > The next one is almost correct, it should be Na, if it
>> > is used to map a non-wide character in an EA legacy encoding.
>> > FILE SHIFTJIS.TXT------
>> > 0x7E U+203E N # OVERLINE
>>Yes, if U+203E is not used as a doublewidth character in any
>>other conversion tables, it should be "Na".
>> > FILE BIG5.TXT------
>> > 0xA145 U+2022 N # BULLET
>> > If A14E is not in fact a half-width character in
>> > big 5 then what is this supposed to map to?
>> > 0xA14E U+FF64 H # HALFWIDTH IDEOGRAPHIC COMMA
>>Sorry I have no idea. Please ask someone who speaks
>>traditional Chinese. I tested some Chinese-enabled
>>terminals (cxterm and rxvt) and found the character
>>is displayed in doublewidth.
and he finishes:
>>I hope Unicode Consortium takes an initiative to solve this problem.
>>If Unicode Consortium can really do this work, please consider solving
>>"Conversion tables differ between venders" problem written in my page.
>>Japanese people are unhappy with the situation that same JIS X 0208
>>characters are mapped into different Unicode characters depending on
>>vendors. However, I imagine this situation comes from political
>>horse-trading of major vendors and Japanese people are located at
>>hopeless situation... (For example, I imagine Microsoft and Sun
>>will never agree to use common conversion table.) Can Unicode Consortium
>>take an initiative to use a common consistent conversion table?
>>And, please consider "EUC-JP roundtrip compatibility" problem.
>>This problem can automatically solved if
>> > FILE JIS0208.TXT------
>> > 0x2140 U+005C Na # REVERSE SOLIDUS
>>is regarded as a mapping table problem and changed to use corresponding
>>fullwidth form, though I once received a mail like
>> >> However, such a table does not guarantee round-trip conversion.
>> >> This is because JIS0802.TXT converts 0x2140 (0xa1 0xc0 in EUC-JP)
>> >> in JISX0208 into U+005C while 0x5c in EUC-JP must be mapped into
>> >> U+005C. In short, U+005C corresponds to two characters
>> >> (0x5c and 0xa1 0xc0) in EUC-JP.
>> > This is a known problem, and is very unfortunate. We don't have an
>> > official way around this problem. I suggest that you might ask some
>> > on the Unicode mail list and see if other people have tables or code
>> > helps fix this problem. Please see:
>> > http://www.unicode.org/unicode/consortium/distlist.html
>>from Rick McGowan <firstname.lastname@example.org> when I pointed out this
>>round-trip compatibility problem to email@example.com .
>This will all be addressed at the next UTC meeting. I won't promise you that
>you will like the final answer, since I don't know what it will be, but at
>the minimum we are going to look at the problem and decide to do the best
>with the resources we have.