Re: *Why* are precomposed characters required for "backward

From: John Cowan (
Date: Thu Jul 11 2002 - 09:51:37 EDT

Dan Oscarsson scripsit:

> Yes. T.61 is still in use. It uses combining accents. One place where it is
> used is in X.500. It also have the nice way where the combining accent
> comes before the base character making it easier to parse.

But T.61, like ANSEL, does not have both decomposed and precomposed forms.
Unlike ANSEL and Unicode, it does not permit arbitrary combinations of
base and diacritic. In fact, it is probably more sensible to think of
T.61 as a mixed 8/16 bit code with a fixed repertoire of precomposed
characters where it just so happens that every 16-bit character has a
diacritic and specific lead bytes are associated with specific diacritics.

> NFD should not be an extension of ASCII. There are several spacing
> accents in ASCII
> that should be decomposed just like the spacing accents in ISO 8859-1
> are decomposed.

The trouble is that the ASCII spacing accents are no longer just spacing
accents: they have taken on a life of their own due to their extremely
widespread use in programming languages and markup languages. 2^3, whether
you take it as 2 followed by a subscript 3 or as an expression whose
value is 8, is not an instance of 2 followed by a spacing circumflex
followed by 3.

[FAQmeister, this might be a good candidate for the Unicode FAQ.]

> I could ask why are not precomposed characters preferred to be used, if
> they exist?

They are preferred in some contexts, such as Web documents (due to the
W3C Character Model). Unicode itself neither prefers nor disprefers
precomposed characters; it allows both, and it allows one to normalize
either towards or away from precomposition.

John Cowan                              <>    
                Charles li reis, nostre emperesdre magnes,
                Set anz totz pleinz ad ested in Espagnes.

This archive was generated by hypermail 2.1.2 : Thu Jul 11 2002 - 08:19:30 EDT