Re: Grapheme clusters and east asian width from Richard Wordingham on 2015-09-17 (Unicode Mail List Archive)

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Thu, 17 Sep 2015 19:59:04 +0100

On Thu, 17 Sep 2015 19:30:41 +0300
Eli Zaretskii <eliz_at_gnu.org> wrote:

> > Date: Thu, 17 Sep 2015 17:25:34 +0100
> > From: Daniel Bünzli <daniel.buenzli_at_erratique.ch>
> > Cc: richard.wordingham_at_ntlworld.com, unicode_at_unicode.org
> >
> > Le jeudi, 17 septembre 2015 à 17:24, Eli Zaretskii a écrit :
> > > > Is there a formal definition of the algorithm used ? This [1]
> > > > is not very helpful.
> > >
> > > They just use a table of values, AFAIK.
> >
> > But is it standardized or everyone has its own table ?
>
> I don't know, but I'm sure you will find out if you look into the
> glibc sources. They are publicly available.

Shouldn't be that the locale sources? That then makes sense, for
ambiguous width is resolved differently in Eastern and Western
traditions.

However, the calculation from single character width to string width is
quite naïve - they are just added up, at least in some version of glibc!
This doesn't work when a spacing mark decomposes into two spacing marks
- <U+0B95 TAMIL LETTER KA, U+0BCB TAMIL VOWEL SIGN OO> gets a length of
2, while the canonically equivalent string <U+0B95, U+0BC7 TAMIL VOWEL
SIGN EE, U+0BBE TAMIL VOWEL AA> gets a length of 3! This affects the
positioning of text following them in gnome-terminal.

Richard.
Received on Thu Sep 17 2015 - 14:01:11 CDT

This archive was generated by hypermail 2.2.0 : Thu Sep 17 2015 - 14:01:11 CDT