Re: Hyphenation Markup

From: Richard Wordingham via Unicode <unicode_at_unicode.org>
Date: Sun, 3 Jun 2018 11:50:54 +0100

On Sun, 3 Jun 2018 04:31:32 +0100
Richard Wordingham via Unicode <unicode_at_unicode.org> wrote:

> However, the text is actually in the Tham script, and without any
> line-breaking controls, the first and third examples read, marking the
> grapheme cluster boundaries with '|', as ᨾ᩠ᨿᩮ <U+1A3E TAI THAM LETTER
> MA, U+1A60 TAI THAM SIGN SAKOT | U+1A3F TAI THAM LETTER LOW YA, U+1A6E
> TAI THAM VOWEL SIGN E> and ᩉ᩠ᩅᩱ <U+1A4C TAI THAM LETTER LOW HA, U+1A60
> TAI THAM SIGN SAKOT | U+1A45 TAI THAM LETTER WA, U+1A71 TAI THAM VOWEL
> SIGN AI>.

What I have marked is the *extended* grapheme cluster boundaries.
There is a *legacy* grapheme cluster break before the vowel sign. This
may make line-breaking after Indic re-ordering a bit easier. However,
in the Lao language, we have sequences in Tham such as <consonant | left
matra, top matra, ...> ('|' = legacy grapheme break), and I now fully
expect there to be renderings such as:

<left matra>, break, <consonant, top matra, ...>

There seems to be an example about the string hole in the middle line
of BAD-13-1-0100 in Figure 5.4 on p222 of Bounleuth's dissertation
(http://ediss.sub.uni-hamburg.de/volltexte/2016/8039/pdf/Dissertation.pdf),
but I'm not confident of my reading of the split word as <U+1A2F TAI
THAM LETTER DA | U+1A6E TAI THAM VOWEL SIGN E, U+1A65 TAI THAM VOWEL
SIGN I, U+1A60 TAI THAM SIGN SAKOT | U+1A36 TAI THAM LETTER NA>.

Theppitak would be able to confirm or refute, but he doesn't often
participate in this forum.

Richard.
Received on Sun Jun 03 2018 - 05:51:23 CDT

This archive was generated by hypermail 2.2.0 : Sun Jun 03 2018 - 05:51:23 CDT