From: Javier SOLA (lists@khmeros.info)
Date: Fri Jan 30 2009 - 02:17:27 CST
Unfortunatelly, computer treatment of Lao, Thai and Myanmar in many
cases has tended to separate syllables, instead of full words, not
considering the preference that exist is all these scripts to keep the
words together at the end of a line. It is always better to break at the
end of a word. This practice has been damaged by european style
newspapers that write in narrow columns, making layout very complicated
if long words are kept together, and hyphenation has started (with or
without hyphen, depending on cases), but this is not the preferred usage
of the language. In a book you would tend to break at the end of words.
Syllable separation makes modern treatment of text impossible.
We are now moving towards automatic dictionary-based word-separation and
line-breaking for these scripts, and this would always have to be word
based.
Javier
Atif Gulzar wrote:
>> According to Section 11.1 on Thai in TUS 5.0 (p. 376), and Section 16.2 on
>> layout controls (p. 535), U+200B ZERO WIDTH SPACE is the right character for
>> marking word boundaries in languages like Thai which don't use visible
>> spaces between words. I don't see why this would be different for Lao.
>>
>
>
> Lao script is close to Thai but it has different script block (U+0E80
> to U+0EFF) and language processing rules. Unlike Thai, Lao script can
> be break at syllable level at line breaks.
>
> http://www.panl10n.net/english/final%20reports/pdf%20files/Laos/LAO06.pdf
>
>
> --
> Best Regards,
> Atif Gulzar
>
> I ◘◘◘◘ Unicode, ɹɐzlnƃ ɟıʇɐ
>
>
>
>
> On Fri, Jan 30, 2009 at 11:59 AM, Doug Ewell <doug@ewellic.org> wrote:
>
>> ɹɐzlnƃ ɟıʇɐ <atif dot gulzar at gmail dot com> wrote:
>>
>>
>>> I have checked and could not find any Unicode character for word separator
>>> (zero width space as WORD separator). This character/code is needed for
>>> languages where space is not used as word separator. The available zero
>>> width characters are incapable to address this issue. e.g.
>>>
>>> U+200B Zero Width Space: This character is intended for line break control
>>> (In Lao language lines can be broken at syllable levels, Lao uses U+200B to
>>> mark syllable boundaries).
>>> ...
>>>
>> According to Section 11.1 on Thai in TUS 5.0 (p. 376), and Section 16.2 on
>> layout controls (p. 535), U+200B ZERO WIDTH SPACE is the right character for
>> marking word boundaries in languages like Thai which don't use visible
>> spaces between words. I don't see why this would be different for Lao.
>>
>> --
>> Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14
>> http://www.ewellic.org
>> http://www1.ietf.org/html.charters/ltru-charter.html
>> http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
>>
>>
>>
>
>
>
>
This archive was generated by hypermail 2.1.5 : Fri Jan 30 2009 - 02:19:24 CST