From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon May 10 2004 - 11:09:29 CDT
From: "Michael Everson" <everson@evertype.com>
> Japanese is different; the users all use both scripts all the time.
And there are occurences in Japanese of Katakana suffixes or particules added to
Latin or Han words, notably to people names and trademarks... I've seen many
texts where Han and Katakana are mixed in the same "word" (where it would be
inappropriate to insert a word-break between runs of Han and Katakana
particules.)
My first implementation allowed line-breaks after each Han character, but an
exception was made after users request to not do that after Han and before
Katakana (despite line break is allowed between two Han characters), or after
Latin and Katakana. So a simple approache that allows linebreaks between
distinct scripts is deceptive. Am I wrong, or are my users wrong and want it as
a presentation preference?
Also, what about line breaking in long runs of Hangul grapheme clusters (I mean
here the true L+V*T* syllables with their diacritics, not the simplified LV and
LVT sub-syllables encoded in Hangul)? It seems that line breaking in Korean
obeys more to semantics constraints than to normative syllables, and I think it
is quite logical when you see that such presentation is sometimes prefered by
Latin readers too...
To make this work appropriately for some long Japanese or Korean sentences, and
match with users expectations, I had to support explicitly marks where
line-breaks should be allowed, using zero-width spaces. This makes things
complicate if the text is not modified with them. So I had to consider
ideographic (full-width) punctuation too (which is not directly equivalent to
their half-width Latin counter-part, as they already include the space after
them (for example the full-width period/dot, comma or colon) even if the glyph
looks a bit larger.
This archive was generated by hypermail 2.1.5 : Mon May 10 2004 - 11:10:51 CDT