Re: Fwd: Wired 4.09 p. 130: Lost in Translation

From: Alain LaBont/e'/ (alb@sct.gouv.qc.ca)
Date: Wed Aug 28 1996 - 11:07:15 EDT


At 07:20 28/08/1996 -0700, unicode@Unicode.ORG wrote:
>Please be careful. To know whether an A is just only an A, you only have
>to check the next position. If that next position is not a combining
>character, you know it is an A, if it is a combining character, you
>know it is "something else".

My 2 cents... "you only have to check the next positionS", and the plural
may be an unbounded finite number. It indeed makes softare more complex.

In actuality Vietnamese uses up to 2 diacritics per character (so at least 4
different codings are to be taken care of at once too for E CIRCUMFLEX WITH
DOT BELOW TONE MARK, for example), I would say that some linguistic case
might require up to 5 or 6... But everything is allowed, 1 million
diacritics after A at the limit. Somebody has to decide to stop that
look-ahead in actual applications. N is advised. In a speech that I gave at
the 4th UNICODE Workshop in Germany in 1992 about ordering UNICODE and
string comparison, I had set N to 3, but N should be parameterized in
softare. But if one has the choice, he should encode fully composed
characters as a preference, even under level 3 conformance, which is of
course necessary to support (or to plan supporting at least), even if it is
more complex.

Alain LaBonti
Quibec



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT