Re: Grapheme clusters and east asian width

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Thu, 17 Sep 2015 01:19:39 +0100

On Wed, 16 Sep 2015 22:56:42 +0100
Daniel Bnzli <daniel.buenzli_at_erratique.ch> wrote:

> Le mercredi, 16 septembre 2015 22:14, Asmus Freytag (t) a crit :
> > "N" doesn't mean "narrow" but "neutral" - that is, the width is
> > given by other consideration.
>
> Ah right ! Thanks. Narrow is Na.
>
> So a refined algorithm would be to actually do the summation in each
> grapheme cluster as I initially wanted to do with the mapping (F, W
> -> 2), (Na, H -> 1) (N -> 0) and if I get a 0 fallback on 1 or maybe
> try to make an educated guess according to the script/block.

I think you have a problem with U+302E HANGUL SINGLE
DOT TONE MARK and U+302F HANGUL DOUBLE DOT TONE MARK, contrary to what
I said earlier. They are preposed combining marks with
Grapheme_Extend=Yes and EAW=Wide. I'm not sure whether the (legacy &
extended) grapheme cluster <U+AC00, U+302E> should occupy 2, 3 or 4
cells. I think 2 cells is wrong, so summation works better, contrary
to what I said earlier.

Does anyone know how EAW=Wide was derived for these characters?
Apparently they were wide even when they were non-spacing marks
(gc=Mn), e.g.. in Unicode Version 5.0, so I suspect the were not given
individual consideration. I suspect they should be EAW=A(mbiguous).

Richard.
Received on Wed Sep 16 2015 - 19:20:56 CDT

This archive was generated by hypermail 2.2.0 : Wed Sep 16 2015 - 19:20:56 CDT