Re: Fwd: Wired 4.09 p. 130: Lost in Translation

From: Martin J Duerst (
Date: Wed Aug 28 1996 - 11:17:02 EDT

Alain wrote:

>At 07:20 28/08/1996 -0700, unicode@Unicode.ORG wrote:
(that wasn't, but me :-).

>>Please be careful. To know whether an A is just only an A, you only have
>>to check the next position. If that next position is not a combining
>>character, you know it is an A, if it is a combining character, you
>>know it is "something else".
>My 2 cents... "you only have to check the next positionS", and the plural
>may be an unbounded finite number. It indeed makes softare more complex.

Alain - Please check the original mail by Michael Everson, or my mail,
where I have cited the relevant passage. To decide whether a Unicode A
is an A or something else, you indeed just have to look at the next
code. To decide whether it is an A-with-grave or an A-with-grave-and-
hook-below, for example, which is a different thing from what Michael
wrote, you have to look ahead by another position.

>In actuality Vietnamese uses up to 2 diacritics per character (so at least 4
>different codings are to be taken care of at once too for E CIRCUMFLEX WITH
>DOT BELOW TONE MARK, for example), I would say that some linguistic case
>might require up to 5 or 6... But everything is allowed, 1 million
>diacritics after A at the limit. Somebody has to decide to stop that
>look-ahead in actual applications. N is advised. In a speech that I gave at
>the 4th UNICODE Workshop in Germany in 1992 about ordering UNICODE and
>string comparison, I had set N to 3, but N should be parameterized in
>softare. But if one has the choice, he should encode fully composed
>characters as a preference, even under level 3 conformance, which is of
>course necessary to support (or to plan supporting at least), even if it is
>more complex.

The important thing is that for characters with N accents, you don't have
to look ahead by more than N+1 positions. And it is not the potential
number of accents that counts, but the actual number of accents present
in the current instance of the character.

Regards, Martin.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT