Re: interleaved ordering (was RE: Phoenician)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon May 10 2004 - 11:09:29 CDT

  • Next message: jcowan@reutershealth.com: "Re: Katakana_Or_Hiragana"

    From: "Michael Everson" <everson@evertype.com>
    > Japanese is different; the users all use both scripts all the time.

    And there are occurences in Japanese of Katakana suffixes or particules added to
    Latin or Han words, notably to people names and trademarks... I've seen many
    texts where Han and Katakana are mixed in the same "word" (where it would be
    inappropriate to insert a word-break between runs of Han and Katakana
    particules.)

    My first implementation allowed line-breaks after each Han character, but an
    exception was made after users request to not do that after Han and before
    Katakana (despite line break is allowed between two Han characters), or after
    Latin and Katakana. So a simple approache that allows linebreaks between
    distinct scripts is deceptive. Am I wrong, or are my users wrong and want it as
    a presentation preference?

    Also, what about line breaking in long runs of Hangul grapheme clusters (I mean
    here the true L+V*T* syllables with their diacritics, not the simplified LV and
    LVT sub-syllables encoded in Hangul)? It seems that line breaking in Korean
    obeys more to semantics constraints than to normative syllables, and I think it
    is quite logical when you see that such presentation is sometimes prefered by
    Latin readers too...

    To make this work appropriately for some long Japanese or Korean sentences, and
    match with users expectations, I had to support explicitly marks where
    line-breaks should be allowed, using zero-width spaces. This makes things
    complicate if the text is not modified with them. So I had to consider
    ideographic (full-width) punctuation too (which is not directly equivalent to
    their half-width Latin counter-part, as they already include the space after
    them (for example the full-width period/dot, comma or colon) even if the glyph
    looks a bit larger.



    This archive was generated by hypermail 2.1.5 : Mon May 10 2004 - 11:10:51 CDT