Re: Fullwidth and Halfwidth

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Fri Sep 19 1997 - 19:08:38 EDT


I disagree with your analysis of half-width being relative.
Glad to take that one up in detail with you offline.

A./

At 11:22 AM 09/19/97 -0700, you wrote:
>In the context of the misdirected discussion about the W3C
>DOM Core Level 1 Draft which has been showing up on this list,
>John Cowan made a number of observations regarding the status
>of halfwidth and fullwidth characters, as documented in the
>Unicode Standard.
>
>I will try to clarify the intent of the discussion of halfwidth
>and fullwidth forms on page 6-130 of the standard.
>
>First, though, it should clearly be noted that statements
>made in the Unicode Standard in Chapter 6 (Character Block
>Descriptions) do not have normative status. Chapters 3, 4,
>and 7 (Charts) have normative status. The rest of the book,
>including Chapter 6 is provided basically to give as much
>information as possible to help people understand and
>implement the characters correctly. But it is dangerous to
>make legalistic arguments based on the text of Chapter 6,
>since there is rather large leeway for the editors of the
>Unicode Standard to modify and augment such explanatory
>text as new issues arise or old ones require more clarification.
>
>>
>> ISUNG.US.ORACLE.COM wrote:
>>
>> > John Cowan wrote:
>> >
>> > >The status of U+1100-11FF and U+AC00-D7A3 is doubtful. Officially,
>> > >the first block (Hangul Jamo) is halfwidth and the second block
>> > >(Hangul Syllables) is neither, but they both look fullwidth to me.
>> >
>> > Both Hangul Jamo and syllables at Row 11 and Row AC ~ D7 are all
>> > fullwidth. There are halfwidth Hangul Jamo at Row FF.
>>
>> Yes, that is what I think too, as it seems reasonable. Unfortunately,
>> it contradicts the letter of the Unicode Standard (p. 6-130):
>>
>> # In the context of conversion to and from such mixed-width encodings,
>> # all characters in the General Scripts area [i.e. 0000-1FFF]
>> # should be construed as halfwidth (*hankaku*) characters.
>
>In my opinion, this sentence, as it stands, is misleading in that
>it implies that everything in the range U+0000..U+1FFF is halfwidth--
>an implication that John has clearly drawn.
>
>The intent, however, is different. The issue basically arises because
>there are fullwidth forms encoded in the ranges U+FF01..U+FF5E and
>U+FFE0..U+FFE6. When converting a DBCS mixed-width encoding to and
>from Unicode, the fullwidth characters in such a mixed-width encoding
>are mapped to the fullwidth compatibility characters in the FFxx
>block, whereas the corresponding halfwidth characters are mapped to
>ordinary Unicode characters (e.g. ASCII in U+0021..U+007E, plus a
>few other scattered chararacters).
>
>In the context of interoperability with DBCS character encodings,
>that restricted set of Unicode characters in the
>General Scripts area can be construed as halfwidth, rather than
>fullwidth. (This applies only to the restricted set of characters
>which can be paired with the fullwidth compatibility characters.)
>
>In the context of interoperability with DBCS character encodings,
>all other Unicode characters which are not explicitly marked as
>halfwidth can be construed as fullwidth.
>
>In any other context, Unicode characters not explicitly marked as
>being either fullwidth or halfwidth compatibility forms should
>be construed as unmarked as to halfwidth versus fullwidth status.
>
>Please note that "halfwidth" and "fullwidth" are not unitary
>character properties in the same sense as "space" or "combining"
>or "alphabetic". They are, instead, relational properties of
>a pair of characters, one of which is explicitly encoded as
>a halfwidth or fullwidth form for compatibility in mapping to
>DBCS mixed-width character encodings. I consider it a mistake
>to promulgate API's such as isFullwidth or isHalfwidth defined
>on Unicode characters; what is "fullwidth" by default today
>could become "halfwidth" tomorrow by the introduction of another
>character on the SBCS part of a mixed-width code page somewhere,
>requiring the introduction of another fullwidth compatibility
>character to complete the mapping. Hopefully, with the existence
>of Unicode, we won't see more extensions of the mixed-width
>character sets we have to map to, but in any case, treating
>relational properties that are contingent on mixed-width
>character set encodings the same as universal character
>properties is mixing apples and oranges.
>
>>
>> That purports to include the combining jamo at 1100-11FF. The rest of
>> the paragraph says:
>>
>> # All characters in the CJK Phonetics and Symbols area [i.e. 3000-33FF]
>> # and the Unified CJK Ideograph area [i.e. 4E00-9FFF], along with
>> # the characters in the CJK Compatibility Ideographs [i.e. F900-FAFF],
>> # CJK Compatibility Forms [i.e. FE30-FE4F], and Small Form Variants
>> # blocks [i.e. FE50-FE6F], should be construed as fullwidth (*zenkaku*)
>> # characters. Other Compatibility Area [i.e. F900-FFFF] characters
>> # outside of the current block should be construed as halfwidth
>> # characters. The characters of the Symbols Area are neutral regarding
>> # their width semantics.
>
>This is clearly a case of an attempt to add explanatory text which
>ended up overspecifying and thereby missed the mark. This text should
>be changed in the next edition of the standard to avoid such
>misunderstandings.
>
>>
>> Note that the Standard is silent on the halfwidth/fullwidth status of the
>> Hangul Syllables area.
>>
>> As far as I can tell, ISO 10646 is silent on the terms "halfwidth" and
>> "fullwidth" except to say that the characters so named are provided
>> for compatibility.
>
>That is correct. ISO/IEC 10646 does not consider character properties
>(other than combining and mirroring) to be part of its charter.
>The developers of the Unicode Standard, on the other hand, consider
>character properties to be an integral part of the full specification
>of the universal character encoding.
>
>--Ken Whistler
>
>>
>> --
>> John Cowan http://www.ccil.org/~cowan cowan@ccil.org
>> e'osai ko sarji la lojban
>>
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT