Re: UTF-8 and UTF-16 issues

From: John Cowan (jcowan@reutershealth.com)
Date: Tue Jun 20 2000 - 12:31:33 EDT


john wrote:

> So, then, is UTF-32 fixed-width, or must we aim for a UTF-128
> or some such, to end this kind of kludge?

Nope, the 21-bit characters of UTF-32 are sufficient forever.
But a user-visible "character" may contain any number of diacritical
marks, each of which may require its own 32-bit Unicode.
 
> How do ATSUI & TEC deal with these variable-width characters
> and then how can one create custom styles?

I can't speak to that software specifically, but a common approach is
to use UTF-16 and treat surrogate pairs as ligatures. In other
words, in a font which has a glyph for DESERET CAPITAL LETTER LONG I
(provisionally U+10400), insert the mapping "D801+DC00 -> glyph_index(deseretII)"
into the ligature table.

-- 

Schlingt dreifach einen Kreis um dies! || John Cowan <jcowan@reutershealth.com> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT