Re: UTF-8 and UTF-16 issues

From: john (john@nisus.com)
Date: Mon Jun 19 2000 - 23:21:22 EDT

Next message: john: "Re: Characters for Programming Languages"
Previous message: Masahiko Maedera: "UTF-8N?"
Maybe in reply to: OLeary, Sean (NJ): "UTF-8 and UTF-16 issues"
Next in thread: Tony Graham: "Re: UTF-8 and UTF-16 issues"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

>> "OLeary, Sean (NJ)" wrote:
>> UTF-16 is the 16-bit encoding of Unicode that includes the use of
>> surrogates. This is essentially a fixed width encoding.

> certainly not. utf-16, of course, is variable-width: 1 or 2 16-bit units per
> character. certainly the iuc discussion did not spread this under "utf-16"
> but possibly as "ucs-2".
> you can make the point, and this could have been said there, too, that for
> many characters you know they will use exactly one 16-bit unit, and
> you don't need to process surrogates for that. this is not to say the encoding
> is fixed-width; it is the same as how you deal with ascii characters in
> utf-8, without declaring utf-8 to be fixed-width.

>> UTF-8
>> Cons:
>> * Most characters need to expanded into a UTF-16 form prior to table
>> lookups for character properties or codepage mappings.

> rather, i would expect an "expansion" into a 32-bit value, not into surrogate
> pairs. this is more practical (and needs to be done for utf-16, too).

So, then, is UTF-32 fixed-width, or must we aim for a UTF-128
or some such, to end this kind of kludge?

How do ATSUI & TEC deal with these variable-width characters
and then how can one create custom styles?

Next message: john: "Re: Characters for Programming Languages"
Previous message: Masahiko Maedera: "UTF-8N?"
Maybe in reply to: OLeary, Sean (NJ): "UTF-8 and UTF-16 issues"
Next in thread: Tony Graham: "Re: UTF-8 and UTF-16 issues"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT