>> "OLeary, Sean (NJ)" wrote:
>> UTF-16 is the 16-bit encoding of Unicode that includes the use of
>> surrogates. This is essentially a fixed width encoding.
> certainly not. utf-16, of course, is variable-width: 1 or 2 16-bit units per
> character. certainly the iuc discussion did not spread this under "utf-16"
> but possibly as "ucs-2".
> you can make the point, and this could have been said there, too, that for
> many characters you know they will use exactly one 16-bit unit, and
> you don't need to process surrogates for that. this is not to say the encoding
> is fixed-width; it is the same as how you deal with ascii characters in
> utf-8, without declaring utf-8 to be fixed-width.
>> * Most characters need to expanded into a UTF-16 form prior to table
>> lookups for character properties or codepage mappings.
> rather, i would expect an "expansion" into a 32-bit value, not into surrogate
> pairs. this is more practical (and needs to be done for utf-16, too).
So, then, is UTF-32 fixed-width, or must we aim for a UTF-128
or some such, to end this kind of kludge?
How do ATSUI & TEC deal with these variable-width characters
and then how can one create custom styles?
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT